Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore nitpick warnings with regular expressions using nitpick_ignore_regex #9131

Merged
merged 5 commits into from May 5, 2021
Merged

Ignore nitpick warnings with regular expressions using nitpick_ignore_regex #9131

merged 5 commits into from May 5, 2021

Conversation

RuRo
Copy link
Contributor

@RuRo RuRo commented Apr 24, 2021

Subject: Allow regex patterns in nitpick_ignore.

Edit: this PR originally proposed adding glob-like pattern matching, but after some discussion, it was changed to use regex patterns instead.

Feature or Bugfix

  • Feature

Purpose

Allow the user to specify regex patterns in the nitpick_ignore. Currently, the user is forced to enumerate all the nitpicks that should be ignored, even if they follow a predictable pattern. So after some time, the nitpick_ignore array might end up looking something like this:

nitpick_ignore = [
    ('py:mod', 'foo'),
    ('py:class', 'foo.Something'),
    ('py:mod', 'foo.bar'),
    ('py:class', 'foo.bar.Baz'),
    ('py:func', 'foo.bar.boo.'),
    # ... etc ...
]

With this PR, the user can instead define nitpick_ignore_regex = [(r'py:.*', r'foo\..*')].

Compatibility

I reuse the sphinx.util.matching mechanism to match glob patterns. To be completely honest, I am not 100% sure if it is okay to match the type and target of the nitpick warnings like paths (i.e. are there any cases where type or target may contain the symbols /?[]^* and not be a path). Based on the use cases of nitpick_ignore, that I've personally seen, I don't expect this to be an issue, but this is technically a breaking change if somebody is currently matching something like [woops]!.

Edit: no longer applicable, see further discussion in the comments.

Tests

I didn't find any tests for the current behaviour of nitpick_ignore, so I've implemented the tests both for the current behaviour and the new regex feature. I did this by finding existing tests, which already use nitpicky=True and assert that there should be some errors. I copied these tests, added the nitpick_ignore_regex option and asserted that the errors are gone in that case.

@tk0miya tk0miya added internals:config type:proposal a feature suggestion labels Apr 24, 2021
@tk0miya tk0miya added this to the 4.1.0 milestone Apr 24, 2021
@tk0miya
Copy link
Member

tk0miya commented Apr 24, 2021

I reuse the sphinx.util.matching mechanism to match glob patterns. To be completely honest, I am not 100% sure if it is okay to match the type and target of the nitpick warnings like paths (i.e. are there any cases where type or target may contain the symbols /?[]^* and not be a path).

Good point. sphinx.util.matching is an extended fnmatch module. So it's main purpose is to match filename and pattern. So it's not good to compare nitpick_ignore and warning types. Some languages would contains many kinds of symbols as you mentioned (like a pointer (int*) in C/C++).

So I think using regexp is better for this case. What do you think?

tests/test_domain_c.py Outdated Show resolved Hide resolved
@RuRo
Copy link
Contributor Author

RuRo commented Apr 26, 2021

Some languages would contains many kinds of symbols as you mentioned (like a pointer (int*) in C/C++).

Nice catch, I totally forgot about pointer/array types in C-like languages.

So I think using regexp is better for this case. What do you think?

Hmm, I've thought about this a bit more, and it seems that even if we use regex for this (which we probably should), it would still break some existing uses of nitpick_ignore. For example, int* will no longer match the literal int*, but instead will match in, int, intt, inttt etc. Which is probably too much of a breaking change.

I see 3 possible solutions for this:

  1. Use a separate configuration variable
    nitpick_ignore = [
        ('old behaviour', 'int*'),
        ...
    ]
    nitpick_ignore_regex = [
        (r'py:.*', r'foo\..*'),
        ...
    ]
  2. Use nitpick_ignore like originally proposed, but allow the user to toggle this behaviour
    nitpick_ignore = [
        (r'new behaviour', r'int\*'),
        (r'py:.*', r'foo\..*'),
        ...
    ]
    nitpick_ignore_regex = True
  3. Use nitpick_ignore, instead of regex strings, make the user re.compile the regular expressions on their own and check isinstance(v, re.Pattern) in the implementation
    nitpick_ignore = [
        ('old behaviour', 'int*'),
        (re.compile(r'py:.*'), re.compile(r'foo\..*')),
        ...
    ]

@RuRo RuRo changed the title Allow glob patterns in nitpick_ignore Allow regex patterns in nitpick_ignore Apr 26, 2021
@RuRo
Copy link
Contributor Author

RuRo commented May 1, 2021

@tk0miya ping? Do you have a preference for any of the 3 options?

@jakobandersen
Copy link
Contributor

For my self I have an ad-hoc extension where I define different kind of patterns to ignore. Getting rid of it and use build-in ignore functionality would be nice, and I think this PR might be enough.
I think I like option (1) best, it allows for straight-forward incremental adoption.
(2) might be annoying if you need a single pattern, but have a lot of exact strings that need regex escaping.
(3) looks a bit clunky, but is in another sense more elegant than (1), so I'm not against it as such. It still allows for incremental adoption.

One thing which is not explicitly clear, but should be in the documentation in the end: is the regex matched directly, or with begin/end anchors? E.g., for a pattern a.*b, would it match sabt? From your example it looks like each regex will be used as if ^ and $ are prepended and appended. E.g., the pattern a.*b means the target string must match ^a.*b$.

@tk0miya
Copy link
Member

tk0miya commented May 1, 2021

+1 for the 1st idea.

Additionally, @shimizukawa also votes for the 1st idea. I talked with him at the Hack-a-thon event today.

One thing which is not explicitly clear, but should be in the documentation in the end: is the regex matched directly, or with begin/end anchors? E.g., for a pattern a.*b, would it match sabt? From your example it looks like each regex will be used as if ^ and $ are prepended and appended. E.g., the pattern a.*b means the target string must match ^a.*b$.

I think it's better to match to the beginning of the types as re.match() does.

@jakobandersen
Copy link
Contributor

I think it's better to match to the beginning of the types as re.match() does.

So that would mean that ns:: and ns::.* are equivalent and both means ^ns::.*$, right? That is, the user-specified regex is conceptually put between ^ and .*$ for fully matching the string,
If so, agreed, that sounds good to me.

@tk0miya
Copy link
Member

tk0miya commented May 1, 2021

Wait, it's not good to compare like re.match() does.

nitpick_ignore_regex = [(r'py::class', r'foo')]

This should not match to subtype=foobar. It should exactly match to subtype=foo. So it's better to evaluate it as ^foo$ as @jakobandersen said. Good point!

@jakobandersen
Copy link
Contributor

Wait, it's not good to compare like re.match() does.

nitpick_ignore_regex = [(r'py::class', r'foo')]

This should not match to subtype=foobar. It should exactly match to subtype=foo. So it's better to evaluate it as ^foo$ as @jakobandersen said. Good point!

Having it symmetric with respect to start and end anchors is probably easier to understand for those not used to the Python re.match(), but just for good measure: in this case one could be the entry in the ordinary nitpick_ignore and prevent matching foobar.
As an argument against putting any anchors in implicitly: grep foo and sed s/foo/f/ will match foo anywhere in the line, so if the user is used to those tools, it might be surprising that the regex pattern is not interpreted as is. By that I suggest taking the regex as is. So foo would match foobar. The user can then insert anchors for start, end, word boundary, etc. as they wish.
For non-Unix-like command line users, what would be least surprising?
Do we have other configuration variables with regexes? and what is done there?

@RuRo RuRo changed the title Allow regex patterns in nitpick_ignore Ignore nitpick warnings with regular expressions using nitpick_ignore_regex May 2, 2021
@RuRo
Copy link
Contributor Author

RuRo commented May 2, 2021

Thank you everyone for the feedback. I've implemented option 1 - nitpick_ignore_regex.

Regarding how should the regular expression be matched, if I understood correctly, the 3 options were

  • re.match - matches "glued" to the beginning of the string
  • re.search - matches anywhere in the string
  • re.fullmatch - matches the whole string exactly (beginning to end)

My personal opinion is that the behaviour of re.match is indeed not very intuitive and doesn't follow the principle of least astonishment in this case. So for me, the choice was between re.search and re.fullmatch. In the end, I've implemented re.fullmatch, but I am open to further bikeshedding discussion of this topic.

My motivation for choosing fullmatch instead of search is:

  1. A plain string (with no special regex characters) would "match" the same way as in nitpick_ignore. This makes it easier to switch to nitpick_ignore_regex for all nitpicks instead of keeping 2 separate config variables.

  2. I expect, that some users might need the "advanced" regex matching only for type or only for target. This might lead them to write a pattern like ('.*', 'target') or ('type', '.*'). In both cases re.search would unexpectedly (imo) treat the "plain" strings as .*target.* and .*type.*.

  3. fullmatch is stricter (it matches a subset of strings, compared to search).

    • If the user expected the regular expression to work like re.search, but the implementation uses re.fullmatch, the nitpicky warning won't disappear, so it would be immediately obvious to the user, that he did something wrong.

    • If the user expected the regular expression to work like re.fullmatch, but the implementation uses re.search, then the nitpicky warning that he tried to silence will disappear, but the user would also have accidentally silenced all the warnings that contain the expression he used as a substring, which is not what the user intended.

Just in case, I've added a description of this behaviour to the documentation and a separate unittest for it.

@jakobandersen
Copy link
Contributor

Good point about fullmatch vs. search, and the added documentation makes it clear.

Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits.

As I commented above, I prefer the re.fullmatch() approach in this case. +1 for this as updated.

tests/test_domain_c.py Outdated Show resolved Hide resolved
sphinx/transforms/post_transforms/__init__.py Outdated Show resolved Hide resolved
@RuRo
Copy link
Contributor Author

RuRo commented May 5, 2021

@tk0miya have I addressed all of your comments?

Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work. LGTM!

@tk0miya tk0miya merged commit d82c8a7 into sphinx-doc:4.x May 5, 2021
@tk0miya
Copy link
Member

tk0miya commented May 5, 2021

Merged!

tk0miya added a commit to tk0miya/sphinx that referenced this pull request May 5, 2021
tk0miya added a commit that referenced this pull request May 5, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants