Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporating Six Sigma Methodology for Data Quality Control in Great Expectations #9674

Open
vlasvlasvlas opened this issue Mar 28, 2024 · 0 comments

Comments

@vlasvlasvlas
Copy link

vlasvlasvlas commented Mar 28, 2024

Is your feature request related to a problem? Please describe.
Currently, there is no explicit support or mention of using Six Sigma methodology within Great Expectations for quality assurance purposes. This makes it challenging for users who wish to apply Six Sigma principles to their data quality control processes.

Describe the solution you'd like
I would like to see built-in support or documentation in Great Expectations for implementing Six Sigma methodology to assess and monitor data quality. This could include guidance on defining expectations, calculating defect rates, and interpreting results in terms of Six Sigma levels.

Describe alternatives you've considered
One alternative would be to manually implement Six Sigma calculations outside of Great Expectations, but this would be less integrated and less automated.

Additional context
By incorporating Six Sigma support into Great Expectations, users would have a comprehensive toolset for managing data quality, aligned with industry-standard quality control practices. This would enhance the utility and versatility of Great Expectations for a wider range of users and use cases.

Example
For instance, let's say we have a dataset representing customer orders in an e-commerce platform. We define expectations within Great Expectations to ensure that order timestamps are within a reasonable range, order amounts are non-negative, and customer addresses are valid. After running these expectations, we calculate a Six Sigma value based on the defect rates found in the data.

Suppose the resulting Six Sigma value is 3.5. This indicates that our data quality is reasonably good, with a defect rate of approximately 233 defects per million opportunities. Over time, as we continue to refine our data pipelines and improve data quality, we aim to see the Six Sigma value increase, indicating fewer defects and higher data quality. By monitoring this value regularly, we can track the effectiveness of our data quality improvement efforts and ensure that our data processes are meeting the desired quality standards.

Related links:
https://docs.oracle.com/cd/B31080_01/doc/owb.102/b28223/concept_data_quality.htm

@vlasvlasvlas vlasvlasvlas changed the title six sigma tool for data QA Incorporating Six Sigma Methodology for Data Quality Control in Great Expectations Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant