Skip to content

AirflowDataPipeline is a daily data collection project that extracts JSON data from a website's API and loads it into a SQL database using Airflow. This project offers an automated and reliable solution for managing data pipelines.

Notifications You must be signed in to change notification settings

OleksandrCherniavskyi/5.AirflowDataPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

5.AirflowDataPipeline

AirflowDataPipeline is a data collection project designed to automate the process of extracting data from a website and storing it in a SQL database using Airflow.In summary, AirflowDataPipeline provides a powerful and efficient way to collect and store data from a website in a SQLite database using Airflow, offering a scalable and reliable solution for data collection and management. The project is specifically designed to run on a daily basis, ensuring that the latest data is always available in the database. The data is collected from the website's API in JSON format, then transformed and loaded into the SQL database.

Installation

1. If you have Docker Desktop

  • download repository
  • open terminal in your favorit IDE or command line
  • in terminal:
  • cd docker_demo
  • docker build -t image_name .
  • docker run -p 8080:8080 image_name
  • in your favorite browser open http://localhost:8080

obraz

Where login : admin, password you can find in container file sestem docker desktop standalone_admin_password.txt

  • Run dag_1 > Trigger DAG

2. Another way

Open linux command line

  • mkdir workspace
  • apt update
  • apt upgrade
  • apt install python3-pip
  • apt install python3-venv
  • python3 -m venv /workspace/venv
  • source /workspace/bin/activate
  • pip install virtualenv
  • pip install pandas
  • pip install apache-airflow
  • export AIRFLOW_HOME=/workspace/airflow
  • airflow version
  • add dags in /workspace/airflow/dags
  • airflow standalone
  • in your favorite browser open http://localhost:8080

Where login : admin /workspace/airflowstandalone_admin_password.txt or in command line

  • Run dag_1 > Trigger DAG

About

AirflowDataPipeline is a daily data collection project that extracts JSON data from a website's API and loads it into a SQL database using Airflow. This project offers an automated and reliable solution for managing data pipelines.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages