Generating fake reviews

This is part of a project that I've done for a Machine Intelligence course at the Zurich University of Applied Sciences. Since its focus was on big data, Apache Spark is involved at some stages.

What does it do

The idea is to generate fake reviews based on Yelp's review dataset and then try to detect them with more traditional ML methods.

Data wrangling with PySpark, tokenisation with CoreNLP from Stanford
Training a Seq2Seq model with a fork of OpenNMT-py, this part is heavily inspired by Stay On-Topic: Generating Context-specific Fake Restaurant Reviews by Juuti et al.
Training various classifiers with Spark ML that try to distinguish between fake and and real reviews.

Some sample outputs

The Seq2Seq model was able to generate rather convincing looking fake reviews:

SENT 1: ['4.0', 'las', 'vegas', 'restaurants', ',', 'vietnamese']
PRED 1: i love this place ! the food is always fresh and delicious . it 's a little pricey , but worth every penny .
PRED SCORE: -33.2517

SENT 2: ['1.0', 'calgary', 'asian', 'fusion', ',', 'chicken', 'wings', ',', 'food', ',', 'cafes', ',', 'chinese', ',', 'desserts', ',', 'juice', 'bars', '&', 'smoothies', ',', 'restaurants']
PRED 2: this is the worst bubble tea place i 've ever been to . it 's so expensive and they do n't have a lot of options .
PRED SCORE: -36.3755

SENT 3: ['3.0', 'scottsdale', 'japanese', ',', 'nightlife', ',', 'restaurants', ',', 'sushi', 'bars']
PRED 3: this is a good place to go if you 're in the mood for japanese food . it 's not bad , but nothing special .
PRED SCORE: -36.7155

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
fake-reviews		fake-reviews
.gitignore		.gitignore
01_fake_reviews_prep.ipynb		01_fake_reviews_prep.ipynb
02_fake_news_tokenization.ipynb		02_fake_news_tokenization.ipynb
03_fake_reviews_training_gen.ipynb		03_fake_reviews_training_gen.ipynb
04_fake_review_detection.py.ipynb		04_fake_review_detection.py.ipynb
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
split.sh		split.sh
test_context_excerpt.txt		test_context_excerpt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fake-reviews

fake-reviews

.gitignore

.gitignore

01_fake_reviews_prep.ipynb

01_fake_reviews_prep.ipynb

02_fake_news_tokenization.ipynb

02_fake_news_tokenization.ipynb

03_fake_reviews_training_gen.ipynb

03_fake_reviews_training_gen.ipynb

04_fake_review_detection.py.ipynb

04_fake_review_detection.py.ipynb

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

split.sh

split.sh

test_context_excerpt.txt

test_context_excerpt.txt

Repository files navigation

Generating fake reviews

What does it do

Some sample outputs

About

Releases

Packages

Languages

License

hokkaido/fake-reviews

Folders and files

Latest commit

History

Repository files navigation

Generating fake reviews

What does it do

Some sample outputs

About

Topics

Resources

License

Stars

Watchers

Forks

Languages