ExampleHome

Versatile Data Kit (VDK) is a data framework that enables Data Engineers to

🧑‍💻 develop,
▶️ run,
📊 and manage data workloads, aka data jobs

What Problem Does Versatile Data Kit Solve?

Ingest data from different sources.
Use Python/SQL and VDK templates to transform data.
Package, version, and deploy data applications while dealing with credentials, retries, and reconnects.
Provide built-in monitoring and smart notification capabilities.
Track code and data modifications for quicker troubleshooting and version rollback.

Quickstart

Getting started with VDK SDK

All getting started work in Google Collab (link) or any installation of VDK. But if you want to run examples locally, try out quickstart VDK
pip install quickstart-vdk
This installs the core vdk packages and the vdk command line interface. You can use them to run jobs in your local shell environment. Then you can run
vdk dev-studio --start
to start a local notebook server and follow the instructions there.


Extract data with VDK Ingester	Process data with SQL with VDK Managed Connection	Create a star schema with VDK Templates	Extract data incrementally with VDK Ingester and Properties	Trace your SQL provenience with installing VDK lineage plugin

Getting started with VDK Control Service


Install VDK Server	Deploy Job	Rollback Job to latest stable version	Schedule Job	Monitor Job with Operations UI

Installation

➡️ See the Installation for more details.

Create and run data jobs locally

pip install quickstart-vdk

This installs the core vdk packages and the vdk command line interface. You can use them to run jobs in your local shell environment.

See also the Getting Started section of the wiki

Main Concepts

➡️ See the Interfaces for more details.

HOME

SDK - Develop Data Jobs

SDK Key Concepts

Control Service - Deploy Data Jobs

Control Service Key Concepts

Operations UI

Plugins

Community

Contacts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly