Skip to content
Qasim Iqbal edited this page Apr 16, 2016 · 9 revisions

Making new scrapers

Here's the started code used for starting up on writing a new scraper:

from ..utils import Scraper
from bs4 import BeautifulSoup
from collections import OrderedDict
from datetime import datetime, date
import json
import requests
import pytz


class ScraperName:
    """A scraper for <scraper description>."""

    host = '<scraper website>'

    @staticmethod
    def scrape(location='.'):
        Scraper.logger.info('ScraperName initialized.')
        
        # do the magic here

        Scraper.logger.info('ScraperName completed.')

Remember that output goes to JSON files in the given location parameter as the path. Also, the dictionary used to dump the JSON should be an OrderedDict to preserve order.

Testing scrapers

I test the scrapers using a test.py script like the following:

import uoftscrapers
import logging
import sys

# Set up logging so it prints to standard output
logger = logging.getLogger("uoftscrapers")
logger.setLevel(logging.INFO)
ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.INFO)
logger.addHandler(ch)

# Run scraper
uoftscrapers.ScraperName.scrape(location='./data')

Placed at the root of the repo, this will import the uoftscrapers from the source and not the one pip3 is managing.

Clone this wiki locally