⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
-
Updated
May 30, 2024 - Python
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
re_data - fix data issues before your users & CEO would discover them 😊
Great Expectations Airflow operator
re_data - fix data issues before your users & CEO would discover them 😊
Various files useful for manual testing and test automation etc.
Code review for data in dbt
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Test data management tool for any data source, batch or real-time
Documentation for Data Caterer
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
Example API implementation for Data Caterer
Dynamic data testing engine based on pySpark
Software Testing in Open Source and Data Science: A talk delivered at the Data Umbrella speaker series
Simple DB Fixtures for Sails.js v1 (fake data for testing).
Data generation and validation tool for any data source
I'm learning how to use dbt with BigQuery so I can apply that knowledge wherever we end up working. It seems like a good DWH interface tool to know for data transformation and testing, and allows me to solidify concepts of testing in data ops.
Example API implementation for Data Caterer
Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.
DataOps TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing testing of new data refreshes, & continuous data anomaly monitoring
data and pipeline testing with and for SQL
Add a description, image, and links to the data-testing topic page so that developers can more easily learn about it.
To associate your repository with the data-testing topic, visit your repo's landing page and select "manage topics."