Skip to content

dogucanelci/Azure_e2e_data_engineering_project_1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

image

Analyze Project using AdventureWorks 2019 DB

On-prem DB to Azure Cloud Pipeline with Data Factory, Lake Storage, Spark, Databricks, Synapse, PowerBI

📝 Table of Contents

  1. Project Overview
  2. Project Architecture
    2.1. Data Ingestion
    2.2. Data Transformation
    2.3. Data Loading
    2.4. Data Reporting
  3. Credits
  4. Contact

🔬 Project Overview

This project can be defined as End-to-end Data Engineering Project applied in Azure Cloud. Basically, Data Ingestion is applied with using Data Factory which gets raw data from on-premise SQL DB to Azure Data Lake storage in bronze layer, then data transformation process is applied by Azure Databricks using Spark and transformed data is stored in silver layer and gold layer kept cleansed data which is loaded into Synapse Serverless DB and its data is visualized in PowerBI report. Also, I used Azure Active Directory (AAD) and Azure Key Vault for the data monitoring and governance purpose.

📝 Project Architecture

You can find the detailed information on the diagram below:

1_project_str

📤 Data Ingestion

  • Connected the on-premise SQL Server with Azure using Microsoft Integration Runtime.

2_IR

  • Setup the Resource group with needed services (Key Vault, Storage Account, Data Factory, Databricks, Synapse Analytics)

3_resource_group

  • Migrated the tables from on-premise SQL Server to Azure Data Lake Storage Gen2.

4_containers

5_pipeline_1

⚙️ Data Transformation

  • Mounted Azure Blob Storage to Databricks to retrieve raw data from the Data Lake.
  • Used Spark Cluster in Azure Databricks to clean and refine the raw data.
  • Saved the cleaned data in a Delta format; optimized for further analysis.

6_databricks_bronze_to_silver

7_databricks_silver_to_gold

📥 Data Loading

  • Used Azure Synapse Analytics to load the refined data efficiently.
  • Created SQL database and connected it to the data lake.

8_synapse_pipeline

9_gold_db_views

📊 Data Reporting

  • Connected Microsoft Power BI to Azure Synapse, and used the Views of the DB to create interactive and insightful data visualizations.

10_powerbi_report

🛠️ Technologies Used

  • Data Source: SQL Server
  • Orchestration: Azure Data Factory
  • Ingestion: Azure Data Lake Gen2
  • Storage: Azure Synapse Analytics
  • Authentication and Secrets Management: Azure Active Directory and Azure Key Vault
  • Data Visualization: PowerBI

📋 Credits

📨 Contact Me

LinkedIn Website Gmail