beam-sandbox

Enable some of the GCP APIs (dataflow api, json api, logging api, biq query api, storage api, datastore api) from GCP console UI
Establish an environment for beam
- create a conda conda create -n beam-sandbox environment,
- activate w/ conda activate beam-sandbox and
- install pip install apache-beam[gcp, test] for GCP & Test additions
Test environment w/
- python -m apache_beam.examples.wordcount --output beam/text,
- Then, cat beam/t* to see words and counts.
Create a bucket for dataflow on GCP Storage, right after creating a GCP Project !
- Edit ./run-count-dataflow.sh file and change w/ your ${PROJECT_ID}
- Create a bucket named beam-pipelines-123
- Under this folder create folders for every beam file such as line-count
  - then, staging and temp folders such as line-count\staging and
  - line-count\temp folders
Create a dataset bucket gs://spark-dataset-1 on GCP Storage, and upload dataset folder into it. Public bucket level is much better.
export GOOGLE_APPLICATION_CREDENTIALS=PATH_OF_SERVICE_ACCOUNT.json

to run
- python line-count.py on your local (uses DirectRunner), or
- Run on your local or GCP shell/Instance ./line-count-dataflow.sh (uses DataFlowRunner)
Look Dataflow UI on GCP console and dataflow jobs running.
Check logs

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
datasets		datasets
.gitignore		.gitignore
README.md		README.md
average-prices-by-product-enhanced.py		average-prices-by-product-enhanced.py
average-prices-by-product.ipynb		average-prices-by-product.ipynb
average-prices-by-product.py		average-prices-by-product.py
average-prices-by-product.sh		average-prices-by-product.sh
conda.beam-sandbox.env.export.yaml		conda.beam-sandbox.env.export.yaml
line-count-dataflow.sh		line-count-dataflow.sh
line-count.py		line-count.py
min-weather-on-year-1800.py		min-weather-on-year-1800.py
most-popular-movie.py		most-popular-movie.py
movie-ratings-histogram.py		movie-ratings-histogram.py
word-count-enhanced.py		word-count-enhanced.py
word-count-enhanced.sh		word-count-enhanced.sh
word-count.py		word-count.py

tansudasli/beam-sandbox