data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
-
Updated
May 30, 2024 - Python
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
lakeFS - Data version control for your data lake | Git for data
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Wren Engine is the backbone of the semantic layer - The semantic engine for LLMs, bringing business context to AI agents.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
🦖 Efficiently evolve your old fixed-length data files into more modern file formats, fully parallelized!
A Fast, Declarative, and Extensible ETL Framework for Graph Databases.
An Git-like version control file system for data lineage & data collaboration.
Data API Framework for AI Agents and Data Apps
ETL pipeline using Pulumi, AWS services, and Snowflake for automated data flow.
Data Lake on the Edge
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Portfolio of projects and studies conducted in data engineering.
wrapper for multiple linkml storage engines (alpha software)
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
📒(GitBook) A curated list of awesome Data Engineering resources
Add a description, image, and links to the data-lake topic page so that developers can more easily learn about it.
To associate your repository with the data-lake topic, visit your repo's landing page and select "manage topics."