Skip to content

Yelp Dataset Big Data process and analysis with Apache Beam and Google Cloud integration

License

Notifications You must be signed in to change notification settings

HardNorth/yelp-beam-processing

Repository files navigation

yelp-beam-processing

Yelp Dataset Big Data process and analysis with Apache Beam and Google Cloud integration

Usage:

Export to BigQuery:

gradle execute \
    -Dexec.mainClass=net.hardnorth.yelp.ingest.bigquery.IngestBusiness \
    -Dexec.args="--project=<PROJECT_ID> --runner=org.apache.beam.runners.dataflow.DataflowRunner --stagingLocation=gs://yelp-dataset/staging/ --tempLocation=gs://yelp-dataset/temp/ --datasetId=yelp --tableName=business --dataSourceReference=gs://yelp-dataset/business.json --syncExecution=false"

Export to CSV:

gradle execute \
    -Dexec.mainClass=net.hardnorth.yelp.ingest.csv.IngestBusiness \
    -Dexec.args="--project=<PROJECT_ID> --runner=org.apache.beam.runners.dataflow.DataflowRunner --stagingLocation=gs://yelp-dataset/staging/ --tempLocation=gs://yelp-dataset/temp/ --dataSourceReference=gs://yelp-dataset/business.json --dataOutputReference=gs://yelp-dataset/business-processed.csv --syncExecution=false"

# Compose CSV after export
# Unfortunately leave CSV headers of composing files in the result file
gsutil compose \
    gs://yelp-dataset/business-processed.csv-* \
    gs://yelp-dataset/business-processed.csv

About

Yelp Dataset Big Data process and analysis with Apache Beam and Google Cloud integration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages