Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Expectation type expect_column_values_to_match_regex fails for databricks-sqlalchemy #9546

Open
satniks opened this issue Feb 28, 2024 · 6 comments
Labels
databricks-sql DatabricksSQL related

Comments

@satniks
Copy link

satniks commented Feb 28, 2024

We are using latest GX (v 0.18.10) with databricks based on python packages sqlalchemy-databricks and databricks-sql-connector.

All our GX rules which worked fine with databricks/sqlalchemy except the rule for regular expression. We get following error in the validation result indicating there is no support for databricks for regex rules using sqlalchemy.

"Regex is not supported for dialect {_dialect.dialect.name!s}
"\nAttributeError: 'DatabricksDialect' object has no attribute 'dialect'

@satniks
Copy link
Author

satniks commented Feb 28, 2024

As GX team is not accepting pull requests, I am providing the fix here. The code block is exactly same as that of snowflake. The check for databricks is using dialect.name. May be there is better approach for this check which I am not aware.

File Name: great_expectations/expectations/metrics/util.py
Function Name: get_dialect_regex_expression()
New Code to be added: (preferably after snowflake block)

try:
    # Databricks
    if dialect.name == 'databricks':
        if positive:
            return sqlalchemy.BinaryExpression(
                column, sqlalchemy.literal(regex), sqlalchemy.custom_op("REGEXP")
            )
        else:
            return sqlalchemy.BinaryExpression(
                column,
                sqlalchemy.literal(regex),
                sqlalchemy.custom_op("NOT REGEXP"),
            )
except (
    AttributeError,
    TypeError,
):  # TypeError can occur if the driver was not installed and so is None
    pass

@Kilo59 Kilo59 added the databricks-sql DatabricksSQL related label Mar 5, 2024
@immerautumn
Copy link

immerautumn commented Mar 19, 2024

One of my test developers also has this problem; we are using databricks extensively in our company, and hoped to use this to automate much of the testing. Although it's a small thing to do some polymorphism, overriding, or just good old fashioned works-on-my-machine changes, it would be best for us not to have to maintain additional code behind a specific version of the library.

@Kilo59 is there any ability to re-vist an MR/PR for this? This has merit and certainly impacts test driven organizations using databricks as a primary hub for storage, access, and datalake aggregation. @satniks Thanks for the write up!

@Kilo59
Copy link
Member

Kilo59 commented Mar 19, 2024

@immerautumn @satniks
Where did you hear we aren't accepting Pull Requests?
We are preparing for a major v1.0 release, so we aren't doing weekly releases and aren't as focused on bug fixes like this.

However, as far as I know, we are still accepting pull requests.
I would expect a fix for this to be accepted.

@immerautumn
Copy link

@Kilo59 I was just parroting what I read above. Worked out for all the good though! @satniks If you feel inclined, send them a PR; I know we would appreciate it highly.

satniks added a commit to satniks/great_expectations that referenced this issue Mar 20, 2024
…or databricks-sqlalchemy

Added support for databricks-sqlalchemy for the regular expression based expectation expect_column_values_to_match_regex

Please check the corresponding issue
great-expectations#9546
@satniks
Copy link
Author

satniks commented Mar 20, 2024

@immerautumn , submitted the PR
#9641

@Kilo59 , the contribute section of the README file says "We’re temporarily pausing the acceptance of new pull requests (PRs)." and therefore I did not create the PR.
https://github.com/great-expectations/great_expectations?tab=readme-ov-file#contribute

@satniks
Copy link
Author

satniks commented Apr 25, 2024

@Kilo59 , any plan to fix this issue (using this pull request or any other way) for v1.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
databricks-sql DatabricksSQL related
Projects
None yet
Development

No branches or pull requests

3 participants