Skip to content
This repository has been archived by the owner on Nov 13, 2017. It is now read-only.

[REPLACED] Kenya gazettes scraping published on KenyaLaw.org used for OpenGazettes KE accessible at https://opengazettes.or.ke/

License

Notifications You must be signed in to change notification settings

CodeForAfrica-ARCHIVE/opengazettes_ke_scrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Gazettes KE Scraper

Kenya Law gazette scraper built on Scrapy

Installation

  • Clone repo and cd into it
  • Make virtual environment
  • pip install -r requirements.txt
  • Set ENV variables
    • SCRAPY_AWS_ACCESS_KEY_ID - Get this from AWS
    • SCARPY_AWS_SECRET_ACCESS_KEY - Get this from AWS
    • SCRAPY_FEED_URI=s3://name-of-bucket-here/gazettes/data.jsonlines - Where you want the jsonlines output for crawls to be saved. This can also be a local location
    • SCRAPY_FILES_STORE=s3://name-of-bucket-here/gazettes - Where you want scraped gazettes to be stored. This can also be a local location

Deploying to Scraping Hub

It is recommended that you deploy your crawler to scrapinghub for easy management. Follow these steps to do this:

  • Sign up for free scraping hub account here
  • Install shub locally using pip install shub. Further instructions here
  • shub login
  • shub deploy

Note that on scraping hub, environment variables don't need the SCRAPY_ prefix

Installing scrapy-deltafetch on MacOS

  • brew install berkeley-db
  • export YES_I_HAVE_THE_RIGHT_TO_USE_THIS_BERKELEY_DB_VERSION=1
  • BERKELEYDB_DIR=$(brew --cellar)/berkeley-db/6.2.23 pip install bsddb3. Replace 6.2.23 with the version of berkeley-db that you installed
  • pip install scrapy-deltafetch

About

[REPLACED] Kenya gazettes scraping published on KenyaLaw.org used for OpenGazettes KE accessible at https://opengazettes.or.ke/

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages