Craper

A collection of product image scrapers for various websites.

What does this do?

This script can scrape product images from various websites (listed below) by their product IDs. Those product IDs then can be used to get early links/ early PIDs for each website.

When the command is ran, it looks for new products, saves any new ones into a database and sends you a Discord webhook for each new product found.

Supported sites

Support for more websites yet to come.

Website name	Command parameter	Website URL
Footpatrol	`footpatrol`	https://www.footpatrol.com/
Size	`size`	https://www.size.co.uk/
JDSports (EU)	`jdsports`	https://www.jdsports.co.uk/
TheHipStore	`thehipstore`	https://www.thehipstore.co.uk/
Solebox	`solebox`	https://solebox.com/
Snipes	`snipes`	https://snipes.com/
Onygo	`onygo`	https://onygo.com/
Courir	`courir`	https://www.courir.com/

Setup

Python 3.9+ is required!

Clone this repository

git clone https://github.com/rtunazzz/Craper

Create required files

./bin/config.sh

Add your webhooks, footer & color preferences into the craper/config/config.json file.
(Optional) Add proxies to the craper/config/proxies.txt file

If you're struggling with setting up these configuration files, I recommend checking out these examples!

Note

Proxy usage is not required but recommended for websites that ban often, such as Solebox, Snipes or Onygo.

Installation

Make sure to have everything set up properly before installing.

python setup.py install

Then you can go ahead and start using the command:

# Show the usage info
craper -h

# Start a Footpatrol scraper
craper footpatrol

# Start 10 Footpatrol scrapers, each scraping 100 product IDs
craper footpatrol -t10 -n100

# Start one scraper with proxies, starting from pid 01925412
craper solebox -pt 1 -s 01925412

Example

craper size -t10 -n5 -s 10

Contributing

If you'd like to contribute, feel free to open a pull request!

Adding sites

Adding sites should be relatively easy. All you need to do, is add a model (ideally into a separate file) into the models directory. Afterwards, make sure to import it into the init file to ensure easy importing into the main scraper.py file. Afterwards, just update the SITES variable and that should be it!

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
bin		bin
craper		craper
tests		tests
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
todo		todo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

craper

craper

tests

tests

.gitignore

.gitignore

MANIFEST.in

MANIFEST.in

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

todo

todo

Repository files navigation

Craper

What does this do?

Supported sites

Setup

Note

Installation

Example

Contributing

Adding sites

About

Releases

Packages

Languages

rtunazzz/Craper

Folders and files

Latest commit

History

Repository files navigation

Craper

What does this do?

Supported sites

Setup

Note

Installation

Example

Contributing

Adding sites

About

Topics

Resources

Stars

Watchers

Forks

Languages