GitHub - moritzkoerber/covid-19-data-engineering-pipeline: A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

This repo is my playground to try out various data engineering stuff. The used services/tools/design is not always the best choice or sometimes unnecessary cumbersome – this just reflects me trying to explore different things. At the moment, the pipeline processes Covid-19 data as follows: All infrastructure is templated in AWS CloudFormation or AWS CDK. All steps feature an alarm on failure. The stack can be deployed via Github Actions. I use poetry to manage the dependencies/virtual environment.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/workflows		.github/workflows
ops		ops
src		src
.gitignore		.gitignore
.hadolint.yaml		.hadolint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint		.yamllint
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

ops

ops

src

src

.gitignore

.gitignore

.hadolint.yaml

.hadolint.yaml

.pre-commit-config.yaml

.pre-commit-config.yaml

.yamllint

.yamllint

LICENSE

LICENSE

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

License

moritzkoerber/covid-19-data-engineering-pipeline

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages