Skip to content
This repository has been archived by the owner on Apr 17, 2023. It is now read-only.

andorfermichael/tripadvisor-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TripAdvisor Scrapper

Scrape the hotel reviews of a whole city on TripAdvisor.

Requirements

  • python 3.5

Installation & Setup

Download and install required libs and data:

pip install bs4

Usage Scrapper

Store all reviews of New York City:

python tripadvisor-scrapper.py 60763 New_York_City_New_York

Store all reviews of Paris:

python tripadvisor-scrapper.py 187147 Paris_Ile_de_France

Store all reviews of Vienna:

python tripadvisor-scrapper.py 190454 Vienna

The scrapper requires the city location id and the city name as commandline arguments. Both can be retrieved from the url, for example, https://www.tripadvisor.com/Hotels-g60763-New_York_City_New_York-Hotels.html The city location id is the number after the g. The city name is the string from the dash after the city location id to the dash before Hotels.

Store all reviews of Vienna and additionally store the review urls list as pickle for rescraping later:

python tripadvisor-scrapper.py 190454 vienna --pickle store

A pickle is stored in data/timestamp-cityname

Store all reviews of Vienna using a review urls list loaded from pickle/20160601-1522-vienna.pickle:

python tripadvisor-scrapper.py 190454 Vienna --pickle load --filename 20160601-1522-vienna.pickle

A pickle to load has to be placed in the pickle directory at the same directory level as the tripadvisor-scrapper.py

Usage Totalizer

Put all reviews and hotel information of a city together:

python tripadvisor-totalizer.py /Users/admin/tripadvisor-scrapper/data/20160716-202314-vienna

Author

Michael Andorfer

About

Scrape the hotel reviews of a whole city on TripAdvisor

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages