Skip to content

nihil0/whiskymetrics

Repository files navigation

WhiskyMetrics

Tools for downloading whisky reviews from Reddit's r/Scotch

Setup

  1. Before you use WhiskyMetrics' tools you'll need to download this Google spreadsheet as a CSV file named whisky_reviews.csv. (It's not worth my while to come up with a way to do this programmatically, but if you make one, I'll gladly accept your pull request!)

  2. Next, we'll take the data in the csv file we have downloaded and put that into a SQLite database named singlemalt.db. Create the database using the script populateDB.py. The database consists of a single table called review created as

create table if not exists review(
                        name text, --whisky name
                        region text, --region in Scotland where it was produced
                        postID text, --reddit post ID
                        score integer --review score
                        );

The other WhiskyMetrics scripts interface with this database.

Scrape, Scrub and Fextract

Whiskymetrics' key functionalities are implemented by these scripts.

scrape.py is an executable script which queries the database for post IDs associated with the distillery, region or whisky specified in the input, gets the reddit comment with said post IDs and stores the comment text in .txt files whose names are the same as the post IDs. The files are stored in a directory named{whisky|distillery|region}_<whisky_name>.

Usage:

scrape.py [-h] [--type {1,2,3}] search_string
positional arguments:
  search_string   Whisky, region or ditillery you want to get reviews of.

optional arguments:
  -h, --help      show this help message and exit
  --type {1,2,3}  1, 2 or 3 depending on whether your search string specifies
                  a distillery name, a whisky producing region or the name of
                  a particular whisky. Is set to 3 by default.

scrub.py Processes the review files creates by the scrapereviews command of the Wiskymetrics toolkit. All text is converted to lowercase and special characters are removed. The script takes as its input a folder name, processes all text files in it and adds a SCRUBBED flag to the METADATA file.

Usage:

scrub.py [-h] dir_name

positional arguments:
  dir_name    Name of the folder containing the review files created by scrape.py to be
              scrubbed.

optional arguments:
  -h, --help  show this help message and exit

fextract.py iterates through the files in the folder created by scrape.py and extracts term frequencies (TF) of words associated with four whisky characteristics, namely, colour, nose, taste and finish. The data are stored in a JSON file in a directory called JsonDumps. The JSON file is named <dir_name>.json.

Usage:

fextract.py [-h] dir_name

positional arguments:
  dir_name    Name of the folder containing the review files.

optional arguments:
  -h, --help  show this help message and exit

The above scripts can also be imported as modules.

Testing WhiskyMetrics

After you have cloned this repo and completed the setup, run Test.py. If it fails to run, send me a bug report, or better yet, fix the bug and send me a pull request. Remember to update AUTHORS accordingly.

About

Tools for downloading whisky reviews from r/Scotch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published