lakeFS - Data version control for your data lake | Git for data
-
Updated
May 29, 2024 - Go
lakeFS - Data version control for your data lake | Git for data
OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
The open-source tool for building high-quality datasets and computer vision models
Home of the Open Data Contract Standard (ODCS).
The Open Source Feature Store for Machine Learning
Always know what to expect from your data.
Source-available data quality tool
DataOps TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing testing of new data refreshes, & continuous data anomaly monitoring
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
This automated anomaly detection preprocessing pipeline can be used to automatically preprocess tabular data for anomaly detection methods.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Data quality estimations for OpenStreetMap
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
Tool for automatic determination of data quality (accuracy and precision) of wearable eye tracker recordings
Possibly the fastest DataFrame-agnostic quality check library in town.
FeatHub - A stream-batch unified feature store for real-time machine learning
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."