Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Better error messages for regex assertions #6659

Closed
wants to merge 4 commits into from

Conversation

kalekundert
Copy link
Contributor

This pull request implements a more helpful error message when assertions involving regular expressions fail. More specifically, the improved error message indicates the positions in the query string that would need to change in order for the regular expression to match.

This feature is analogous to the way pytest already provides more helpful messages for certain binary operators, e.g. == between two dicts. The difference is that there is no operator in this case; just a function call. So I modified the assertion rewriting code to handle function calls more like binary operators, in that the task of creating the error message is delegated depending on the specific function being called.

The code I've written so far is just a prototype. I'm hoping to see if there's any interest in this feature before going to all the effort to make the code actually ready to be merged, i.e. writing tests and documentation, etc. I'd also appreciate feedback on ways I could implement this feature better; the assertion rewriting code is pretty foreign to me, and I have the feeling I didn't really do things "right".

Example

Here is an simplified example from a real test I was working on the other day. The regular expression is meant to match a date, but it has a bug. It should be \d{1,2} to match one or two digits. \d{1-2} instead matches any number followed by a literal {1-2}:

    def test_date_regex():
>       assert re.fullmatch(r'\w+ \d{1-2}, \d{4}', 'February 2, 2020')
E       AssertionError: assert re.fullmatch('\w+ \d{…', 'Februar…', flags=0)
E         
E         pattern:       \w+ \d{1-2}, \d{4}
E         string:        February 2, 2020
E         insertions:            ^ ^^
E         substitutions:                ^^^^^^^^^^^^^

The error message indicates where insertions or substitutions would have to be made in order for the string to match the pattern. Deletions can be indicated too, but this example doesn't have any. The hints aren't perfect, but they indicate that the problem starts around the day number, which would hopefully get us on the right track.

Caveats

  • This code adds a dependency on the regex package, which provides the "fuzzy matching" necessary to make the error message.

  • Only the regular expression functions that either return a Match or None are supported.
    These are: re.match(), re.search(), and re.fullmatch().

@kalekundert kalekundert added the type: enhancement new feature or API change, should be merged into features branch label Feb 2, 2020
@blueyed
Copy link
Contributor

blueyed commented Feb 2, 2020

@kalekundert
Great idea, and thanks for working on it already.
I have not looked at the code yet, but it reminded me of https://github.com/asottile/re-assert, which we you might want to check out.

@blueyed blueyed added the topic: rewrite related to the assertion rewrite mechanism label Feb 2, 2020
@kalekundert
Copy link
Contributor Author

Thanks for the link to re-assert. I hadn't heard of it before (and if I had, I probably would've just used it instead of making this PR). It seems that there are two meaningful differences between re-assert and what I have here:

  • The obvious one is that this approach works directly with the standard library re functions. That's really nice because it allows users to benefit without having to know or do anything. Even tests that have already been written and are now just being maintained will benefit. That said, re-assert offers a nice == syntax that is probably more readable than the plain standard library functions.

  • re-assert works by bisecting the original string until it finds the largest substring that can be matched. The next character, then, is the first one that didn't match. In contrast, this PR uses fuzzy matching to have the regular expression engine figure out what would need to change about the string (e.g. insertions, deletions, substitutions) in order for the match to succeed. I'm not really sure which approach is better. The "first mismatch" approach may be easier to interpret, but the "fuzzy matching" approach might be more informative. Maybe the best thing would be to just do both. It's worth noting that both approaches rely of the regex library, so that would be a dependency either way.

It seems like this is a promising idea, so I'll try to make this a more complete PR over the next week or so...

@blueyed
Copy link
Contributor

blueyed commented Feb 8, 2020

/cc @asottile

@bluetech

This comment has been minimized.

@nicoddemus nicoddemus changed the title Better error messages for regex assertions [WIP] Better error messages for regex assertions Feb 14, 2020
@nicoddemus
Copy link
Member

Thanks @kalekundert, it does seem interesting.

This introduces regex as a new dependency, correct? It seems it is not a pure Python library, so I'm not sure it is a good fit for having it as a standard dependency in pytest.

Perhaps we can have this improved representation only if regex is already installed on the system, instead of making it an explicit dependency? We did this recently with #6658, although there we plan to eventually change Pygments into a standard dependency.

@bluetech bluetech changed the base branch from features to master February 16, 2020 15:18
@bluetech
Copy link
Member

We've recently stopped using the features branch

(I've changed the base on my own now).

@kalekundert
Copy link
Contributor Author

Thanks for rebasing for me.

Definitely it's possible to only provide the improved message only if regex is installed separately, and I think that's probably the best way to go.

@nicoddemus
Copy link
Member

Hi @kalekundert!

Recently we decided to close PRs older than a month since last update with the intent to clear up our PR queue, which is quite long.

In light of that, should we close this now? You are free to reopen at a later date of course when you decide to pick this up again.

Cheers!

@kalekundert
Copy link
Contributor Author

Yes, I'm still meaning to get back to this, but I can open a new PR when I do.

@nicoddemus
Copy link
Member

Sure thing, feel free to reopen this PR then, whichever you prefer. 😁

Thanks again!

@nicoddemus nicoddemus closed this Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: rewrite related to the assertion rewrite mechanism type: enhancement new feature or API change, should be merged into features branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants