Skip to content

Application that consists of 2 services - Scraper and Aggregator, using MongoDB as document store.

Notifications You must be signed in to change notification settings

jerinthomas1404/Amazon-Scraper-with-GoColly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon-Scraper-with-GoColly

Application that consists of 2 services - Scraper and Aggregator, and MongoDB as the document store.

Project Overview

Objective

Note: Below services are containerized into two different images.

  1. Scraper Service - This will scrape an Amazon Web Page given its URL.
    • Fetch details such as Name, Image URL, Description and Price.
    • Utilize Colly framework for scraping.
    • Call Aggregator Service to persist the above scraped data in a document store database.
  2. Aggregator Service - This take in the payload from the scraper service and update the database.
    • Write/Update the payload into the database which is MongoDB in our case.
    • Send back a status with details such as URL and ID.

Local Configuration

Note: I have developed on Windows 10 x64 bit + WSL2 Ubuntu-20.04 using Docker-Desktop v 4.9.1

Software Version
Go '1.13'
Docker "20.10.16, build aa7e414"
MongoDB 4.4.2

API Endpoints

Sno. Port Method URL REQ BODY Info
1 8080 POST localhost:8080/scraper Amazon Page URL Colly visits the mentioned URL and scrapes the required data.
2 8081 POST localhost:8081/aggregator Product Details in JSON Format It could either insert/update in the database.
3 8081 GET localhost:8081/aggregator NA Returns all the records from the collection.

How to run locally?

  1. git clone https://github.com/jerinthomas1404/Amazon-Scraper-with-GoColly.git
  2. docker-compose build
  3. docker-compose up -d
  4. Using POSTMAN/Other Application send a POST request to scraper API with a url in the body as JSON.
  5. Sample URLs:
    • https://www.amazon.com/Controller-Compatible-Programming-Vibration-PlayStation-4/dp/B08L7T1VC7/ref=sr_1_2_sspa?th=1

Screenshots

Overview

Overview

Overview

Overview

Overview

About

Application that consists of 2 services - Scraper and Aggregator, using MongoDB as document store.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published