Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected missing-label error with false header_case #1635

Open
amelie-rondot opened this issue Feb 5, 2024 · 0 comments · May be fixed by #1641
Open

Unexpected missing-label error with false header_case #1635

amelie-rondot opened this issue Feb 5, 2024 · 0 comments · May be fixed by #1641

Comments

@amelie-rondot
Copy link
Contributor

Overview

In the of migration from v4 to v5 of frictionless-py in validata.fr, we experienced an unexpected missing-label error when validating a tabular data with header_case=False dialect option and using a column which is lower case instead of upper case as in the schema fields.

For example:

data = [["aa", "BB"], ["a", "b"]]
schema = {
        "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
        "fields": [
            {"name": "AA", "constraints": {"required": True}},
            {"name": "bb", "constraints": {"required": True}}
        ]
    }

Using python, the validation report is invalid containting two missing-label errors:

if __name__ == "__main__":
    schema = frictionless.Schema.from_descriptor(schema)
    report = frictionless.validate(resources.Resource(
        source=source,
        schema=frictionless.Schema.from_descriptor(schema),
        dialect=frictionless.Dialect(header_case=False),
        detector=frictionless.Detector(schema_sync=True)
    ))

    # Expect valid report
    print(report)

Output:

{'valid': False,
 'stats': {'tasks': 1, 'errors': 2, 'warnings': 0, 'seconds': 0.004},
 'warnings': [],
 'errors': [],
 'tasks': [{'name': 'memory',
            'type': 'table',
            'valid': False,
            'place': '<memory>',
            'labels': ['aa', 'BB'],
            'stats': {'errors': 2,
                      'warnings': 0,
                      'seconds': 0.004,
                      'fields': 4,
                      'rows': 1},
            'warnings': [],
            'errors': [{'type': 'missing-label',
                        'title': 'Missing Label',
                        'description': 'Based on the schema there should be a '
                                       "label that is missing in the data's "
                                       'header.',
                        'message': "There is a missing label in the header's "
                                   'field "AA" at position "3"',
                        'tags': ['#table', '#header', '#label'],
                        'note': '',
                        'labels': ['aa', 'BB'],
                        'rowNumbers': [1],
                        'label': '',
                        'fieldName': 'AA',
                        'fieldNumber': 3},
                       {'type': 'missing-label',
                        'title': 'Missing Label',
                        'description': 'Based on the schema there should be a '
                                       "label that is missing in the data's "
                                       'header.',
                        'message': "There is a missing label in the header's "
                                   'field "bb" at position "4"',
                        'tags': ['#table', '#header', '#label'],
                        'note': '',
                        'labels': ['aa', 'BB'],
                        'rowNumbers': [1],
                        'label': '',
                        'fieldName': 'bb',
                        'fieldNumber': 4}]}]}

Expected behaviour

According to the documentation of HeaderCase Dialect parameter, I was expected a valid report.

Other details and experimentations

Used Frictionless version 5.16.1, last commit on main branch

Same result with command line validation.
I have put "schema-sync" to reproduce more closely our use case, but it does not seem to be related with the actual issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment