Skip to content

This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.

License

Notifications You must be signed in to change notification settings

aws-samples/aws-concurrent-data-orchestration-pipeline-emr-livy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AWS Concurrent Data Orchestration Pipeline EMR Livy

This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.

Description of the project folders

cloudformation

This folder contains the cloudformation template that spins up the Airflow infrastructure.

dags/airflowlib

This folder contains reusable code for Amazon EMR and Apache Livy.

dags/transform

This folder contains sample transformation scala code which transforms the movielens data files from csv to parquet.

dags/movielens_dag.py

This script contains the code for the DAG definition. It basically defines the Airflow pipeline.

License

This library is licensed under the Apache 2.0 License.

About

This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published