Note
Still in active development.
PackYak is an open source platform for building modern Data Engineering projects with Python in your own AWS account. It provides everything you need to iterate quickly on massive scale data science, build reliable and flexible production pipelines, version control your data sets and expose interactive data applications and reports.
PackYak leverages the AWS CDK to automatically deploy the following software in to your own AWS Account:
- Kubernetes - fully automated AWS EKS set up of a Kubernetes cluster with pre-installed services
- Coder - run developer environments in the cloud with Terraform-automated environments
- Ray - run heterogeneous clusters consisting of CPU, GPU and Memory optimized EC2 instances.
- Dask, Daft, Spark, etc. - run any compute framework supported by Ray
- Dagster - orchestrate the production of inter-connected "Software-defined Assets"
- Nessie & Iceberg - version your data catalog with Git-like branching, tags and commits.
- Streamlit - build interactive reports over your data with simple Python scripts
Note
Coming Soon.