Udacity - Data Engineer Nanodegree

In this course, students learn how to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets. At the end of the program, students must combine these new skills by completing a capstone project.

Skills Developed:

Dimensional Modeling of databases
SQL and NoSQL data modeling
ETL Techniques and strategies
Data Flows
Python and SQL Programming
Creation and Automation of Data Pipeline

Technologies used in this nanodegree:

PostgreSQL
Apache Cassandra
Amazon Web Services (IAM, EC2, Redshift, S3, ElasticMapReduce, Athena...)
Apache Spark using PySpark
Airflow

In the sections below I briefly describe each technology and project I developed during the course.

Section 1 - Data modeling using PostgreSQL and Apache Cassandra

In this section of Data Engineering Nanodegree, students have the opportunity to practice the following concepts learned during the classes:

Data modeling
Database Schemas (snowflake/star)
Creation of ETL pipelines
Database CRUD

This section have two hands-on projects, where we exercise database modeling, SQL and Python programming.

The first project involves a creation of a PostgreSQL database design to help a fake startup called Sparkify to analyze data from their product, a music streaming app. For more informations about the project, click here.

The second project uses a different approach from the first one, where students have to model the app database using Cassandra, a NoSQL database.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Cloud_Data_Warehouses_AWS		Cloud_Data_Warehouses_AWS
Data_Lakes_With_Spark		Data_Lakes_With_Spark
Data_Modelling_with_CassandraDB		Data_Modelling_with_CassandraDB
Data_Modelling_with_PostgreSQL		Data_Modelling_with_PostgreSQL
Data_Pipelines_With_Airflow		Data_Pipelines_With_Airflow
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloud_Data_Warehouses_AWS

Cloud_Data_Warehouses_AWS

Data_Lakes_With_Spark

Data_Lakes_With_Spark

Data_Modelling_with_CassandraDB

Data_Modelling_with_CassandraDB

Data_Modelling_with_PostgreSQL

Data_Modelling_with_PostgreSQL

Data_Pipelines_With_Airflow

Data_Pipelines_With_Airflow

README.md

README.md

Repository files navigation

Udacity - Data Engineer Nanodegree

Section 1 - Data modeling using PostgreSQL and Apache Cassandra

Section 2 - Cloud Data Warehouses with AWS

Description

Section 3 - Data Lakes with Spark

Description

Section 4 - Data Pipelines With AirFlow

Description

Section 5 - Capstone Project

About

Releases

Packages

Languages

michelmf/data_engineer_nd

Folders and files

Latest commit

History

Repository files navigation

Udacity - Data Engineer Nanodegree

Section 1 - Data modeling using PostgreSQL and Apache Cassandra

Section 2 - Cloud Data Warehouses with AWS

Description

Section 3 - Data Lakes with Spark

Description

Section 4 - Data Pipelines With AirFlow

Description

Section 5 - Capstone Project

About

Topics

Resources

Stars

Watchers

Forks

Languages