Skip to content

Latest commit

 

History

History

scraper

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Scraper

The system that will scrap data for the website.

Add a new scraper

  1. Create a scraper

Create a new file in ./groups/<name>.js

export const name = '<name of group>';
export const url = '<page that list brands>';
export const infoUrl = '<wikipedia page>';

export const scrapDetails = async (get$, getPage) => {
    const details = {
        name,
        slug: slugify(name),
        url,
        infoUrl,
        description,
        picture,
    };
    return details;
};

export const scrapBrands = async (get$, getPage) => {
    const brands = new Map();
    return brands;
};
  1. Scrap details

Usually, we scrap details from the group's wikipedia page.

You have access to a default one getDetailsScraper, it will scrap the name, description and logo of a group, given its url.

You can replace the scrapDetails function of your group with:

import { getDetailsScraper } from '../utils/index.js';

export const scrapDetails = getDetailsScraper(url, infoUrl);
  1. Scrap the brands

In your scrapBrands script you can choose to use either Cheerio or Puppeteer by using respectively get$ and getPage:

export const scrapBrands = async (get$, getPage) => {
    const $ = await get$(url);
    const page = await getPage(url);
};

Then you're free to use whatever lib you need. Take example of what's been already done in ./packages/scraper/groups/*

  1. Run the command
yarn scrap <name>

And it will add the new group and its brands to the shared data in ./packages/website/public/data.json

Usage

yarn start <group>

⚠️ New data will delete the previous data.