You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Utilizing Airflow's built-in functionalities creating a reusable ETL pipeline. Source data resides in a S3 bucket, and the pipeline should include data quality checks and data should be processed within AWS Redshift.
An example repo which aggregates multiple sources of Apache Airflow DAGs from Apache Maven repositories into a single Git branch that can be used with git-sync in the Airflow Helm Chart (User Community).
A process of data cleaning and saving results automatically in separated folders is done using Apache Airflow and a weather API. Specifically the weather data of LA is used.
This project focuses on utilizing Apache Airflow to orchestrate an ETL (Extract, Transform, Load) process using data from the Stack Overflow API. The primary objective is to determine the most prominent tags on Stack Overflow for the current month.
AirflowDataPipeline is a daily data collection project that extracts JSON data from a website's API and loads it into a SQL database using Airflow. This project offers an automated and reliable solution for managing data pipelines.
ETL pipeline to extract webscraped forex data and google news library data on daily basis and storing it in a postgres database for market analysis and insights
This repository focuses on conducting weekly analyses of European Reddit data. It employs a data pipeline orchestrated with Airflow, scheduled to run on a weekly basis.
An end-to-end pipeline that ingests raw data from CSV files through Airflow DAGS into BigQuery. From there, it uses dbt to normalize and clean the data and afterwards to make the transformations and come up with relevan reports.
This project presents a robust data pipeline using Apache Airflow for orchestration, Apache Kafka for real-time data streaming, and MongoDB for data storage. It automates the process of web scraping to collect large companies' data, transforms and processes this data, and then stores it efficiently.