Skip to content

Self-hosted, version-controlled data engineering platform for AWS

Notifications You must be signed in to change notification settings

sam-goodwin/packyak

Repository files navigation

PackYak image

PyPI version

Note

Still in active development.

PackYak

PackYak is an open source platform for building modern Data Engineering projects with Python in your own AWS account. It provides everything you need to iterate quickly on massive scale data science, build reliable and flexible production pipelines, version control your data sets and expose interactive data applications and reports.

PackYak leverages the AWS CDK to automatically deploy the following software in to your own AWS Account:

  • Kubernetes - fully automated AWS EKS set up of a Kubernetes cluster with pre-installed services
  • Coder - run developer environments in the cloud with Terraform-automated environments
  • Ray - run heterogeneous clusters consisting of CPU, GPU and Memory optimized EC2 instances.
  • Dask, Daft, Spark, etc. - run any compute framework supported by Ray
  • Dagster - orchestrate the production of inter-connected "Software-defined Assets"
  • Nessie & Iceberg - version your data catalog with Git-like branching, tags and commits.
  • Streamlit - build interactive reports over your data with simple Python scripts

Get Started

Note

Coming Soon.

About

Self-hosted, version-controlled data engineering platform for AWS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published