Scrapy Redis Example Project

This directory contains an example Scrapy project integrated with scrapy-redis. By default, all items are sent to redis (key <spider>:items). All spiders schedule requests through redis, so you can start additional spiders to speed up the crawling.

Spiders

dmoz

This spider simply scrapes dmoz.org.
myspider_redis

This spider uses redis as a shared requests queue and uses myspider:start_urls as start URLs seed. For each URL, the spider outputs one item.
mycrawler_redis

This spider uses redis as a shared requests queue and uses mycrawler:start_urls as start URLs seed. For each URL, the spider follows are links.

Note

All requests are persisted by default. You can clear the queue by using the SCHEDULER_FLUSH_ON_START setting. For example: scrapy crawl dmoz -s SCHEDULER_FLUSH_ON_START=1.

Processing items

The process_items.py provides an example of consuming the items queue:

python process_items.py --help

Run via Docker

You require the following applications:

docker (https://docs.docker.com/installation/)
docker-compose (https://docs.docker.com/compose/install/)

For implementation details see Dockerfile and docker-compose.yml and read official docker documentation.

To start sample example-project (-d for daemon):
```
docker-compose up
```
To scale crawler (4 instances for example):
```
docker-compose scale crawler=4
```

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
spider		spider
utils		utils
Dockerfile		Dockerfile
README.rst		README.rst
command_list.txt		command_list.txt
docker-compose.yml		docker-compose.yml
pdd_monitor_queue_script.sh		pdd_monitor_queue_script.sh
pdd_spider_queue_script.sh		pdd_spider_queue_script.sh
process_items.py		process_items.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
setting.py		setting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spider

spider

utils

utils

Dockerfile

Dockerfile

README.rst

README.rst

command_list.txt

command_list.txt

docker-compose.yml

docker-compose.yml

pdd_monitor_queue_script.sh

pdd_monitor_queue_script.sh

pdd_spider_queue_script.sh

pdd_spider_queue_script.sh

process_items.py

process_items.py

requirements.txt

requirements.txt

scrapy.cfg

scrapy.cfg

setting.py

setting.py

Repository files navigation

Scrapy Redis Example Project

Spiders

Processing items

Run via Docker

About

Releases

Packages

Languages

k713927/phone_spider

Folders and files

Latest commit

History

Repository files navigation

Scrapy Redis Example Project

Spiders

Processing items

Run via Docker

About

Resources

Stars

Watchers

Forks

Languages