Description
- This crawler will crawl bbc.com/news website and store the details in mongodb hosted using compose.io
- WebServices is included and hosted using amazon ec2
Usage
To crawl
scrapy crawl bbcspider
List all news
curl http://ec2-52-221-187-243.ap-southeast-1.compute.amazonaws.com/news
Search specific news
curl http://ec2-52-221-187-243.ap-southeast-1.compute.amazonaws.com/news/<keyword>
TODO
- Unit tests
- Add more fields
- Automate crawler by cron
Limitation
Time!