Ignore nitpick warnings with regular expressions using `nitpick_ignore_regex` #9131

RuRo · 2021-04-24T00:33:03Z

Subject: Allow regex patterns in nitpick_ignore.

Edit: this PR originally proposed adding glob-like pattern matching, but after some discussion, it was changed to use regex patterns instead.

Feature or Bugfix

Feature

Purpose

Allow the user to specify regex patterns in the nitpick_ignore. Currently, the user is forced to enumerate all the nitpicks that should be ignored, even if they follow a predictable pattern. So after some time, the nitpick_ignore array might end up looking something like this:

nitpick_ignore = [
    ('py:mod', 'foo'),
    ('py:class', 'foo.Something'),
    ('py:mod', 'foo.bar'),
    ('py:class', 'foo.bar.Baz'),
    ('py:func', 'foo.bar.boo.'),
    # ... etc ...
]

With this PR, the user can instead define nitpick_ignore_regex = [(r'py:.*', r'foo\..*')].

Compatibility

I reuse the sphinx.util.matching mechanism to match glob patterns. To be completely honest, I am not 100% sure if it is okay to match the type and target of the nitpick warnings like paths (i.e. are there any cases where type or target may contain the symbols /?[]^* and not be a path). Based on the use cases of nitpick_ignore, that I've personally seen, I don't expect this to be an issue, but this is technically a breaking change if somebody is currently matching something like [woops]!.

Edit: no longer applicable, see further discussion in the comments.

Tests

I didn't find any tests for the current behaviour of nitpick_ignore, so I've implemented the tests both for the current behaviour and the new regex feature. I did this by finding existing tests, which already use nitpicky=True and assert that there should be some errors. I copied these tests, added the nitpick_ignore_regex option and asserted that the errors are gone in that case.

tk0miya · 2021-04-24T15:23:40Z

I reuse the sphinx.util.matching mechanism to match glob patterns. To be completely honest, I am not 100% sure if it is okay to match the type and target of the nitpick warnings like paths (i.e. are there any cases where type or target may contain the symbols /?[]^* and not be a path).

Good point. sphinx.util.matching is an extended fnmatch module. So it's main purpose is to match filename and pattern. So it's not good to compare nitpick_ignore and warning types. Some languages would contains many kinds of symbols as you mentioned (like a pointer (int*) in C/C++).

So I think using regexp is better for this case. What do you think?

tests/test_domain_c.py

RuRo · 2021-04-26T11:36:20Z

Some languages would contains many kinds of symbols as you mentioned (like a pointer (int*) in C/C++).

Nice catch, I totally forgot about pointer/array types in C-like languages.

So I think using regexp is better for this case. What do you think?

Hmm, I've thought about this a bit more, and it seems that even if we use regex for this (which we probably should), it would still break some existing uses of nitpick_ignore. For example, int* will no longer match the literal int*, but instead will match in, int, intt, inttt etc. Which is probably too much of a breaking change.

I see 3 possible solutions for this:

Use a separate configuration variable

nitpick_ignore = [
    ('old behaviour', 'int*'),
    ...
]
nitpick_ignore_regex = [
    (r'py:.*', r'foo\..*'),
    ...
]

Use nitpick_ignore like originally proposed, but allow the user to toggle this behaviour

nitpick_ignore = [
    (r'new behaviour', r'int\*'),
    (r'py:.*', r'foo\..*'),
    ...
]
nitpick_ignore_regex = True

Use nitpick_ignore, instead of regex strings, make the user re.compile the regular expressions on their own and check isinstance(v, re.Pattern) in the implementation
```
nitpick_ignore = [
    ('old behaviour', 'int*'),
    (re.compile(r'py:.*'), re.compile(r'foo\..*')),
    ...
]
```

RuRo · 2021-05-01T02:57:10Z

@tk0miya ping? Do you have a preference for any of the 3 options?

jakobandersen · 2021-05-01T09:43:33Z

For my self I have an ad-hoc extension where I define different kind of patterns to ignore. Getting rid of it and use build-in ignore functionality would be nice, and I think this PR might be enough.
I think I like option (1) best, it allows for straight-forward incremental adoption.
(2) might be annoying if you need a single pattern, but have a lot of exact strings that need regex escaping.
(3) looks a bit clunky, but is in another sense more elegant than (1), so I'm not against it as such. It still allows for incremental adoption.

One thing which is not explicitly clear, but should be in the documentation in the end: is the regex matched directly, or with begin/end anchors? E.g., for a pattern a.*b, would it match sabt? From your example it looks like each regex will be used as if ^ and $ are prepended and appended. E.g., the pattern a.*b means the target string must match ^a.*b$.

tk0miya · 2021-05-01T10:31:55Z

+1 for the 1st idea.

1. Simple.
1. Difficult to switch normal one to regexp one (escaping needed).
1. python knowledge is needed to configure it. It's not matched to the static configuration file: Add static configuration (Sphinx.toml) #9040

Additionally, @shimizukawa also votes for the 1st idea. I talked with him at the Hack-a-thon event today.

One thing which is not explicitly clear, but should be in the documentation in the end: is the regex matched directly, or with begin/end anchors? E.g., for a pattern a.*b, would it match sabt? From your example it looks like each regex will be used as if ^ and $ are prepended and appended. E.g., the pattern a.*b means the target string must match ^a.*b$.

I think it's better to match to the beginning of the types as re.match() does.

jakobandersen · 2021-05-01T10:42:18Z

I think it's better to match to the beginning of the types as re.match() does.

So that would mean that ns:: and ns::.* are equivalent and both means ^ns::.*$, right? That is, the user-specified regex is conceptually put between ^ and .*$ for fully matching the string,
If so, agreed, that sounds good to me.

tk0miya · 2021-05-01T10:49:01Z

Wait, it's not good to compare like re.match() does.

nitpick_ignore_regex = [(r'py::class', r'foo')]

This should not match to subtype=foobar. It should exactly match to subtype=foo. So it's better to evaluate it as ^foo$ as @jakobandersen said. Good point!

jakobandersen · 2021-05-01T11:02:03Z

Wait, it's not good to compare like re.match() does.
nitpick_ignore_regex = [(r'py::class', r'foo')]
This should not match to subtype=foobar. It should exactly match to subtype=foo. So it's better to evaluate it as ^foo$ as @jakobandersen said. Good point!

Having it symmetric with respect to start and end anchors is probably easier to understand for those not used to the Python re.match(), but just for good measure: in this case one could be the entry in the ordinary nitpick_ignore and prevent matching foobar.
As an argument against putting any anchors in implicitly: grep foo and sed s/foo/f/ will match foo anywhere in the line, so if the user is used to those tools, it might be surprising that the regex pattern is not interpreted as is. By that I suggest taking the regex as is. So foo would match foobar. The user can then insert anchors for start, end, word boundary, etc. as they wish.
For non-Unix-like command line users, what would be least surprising?
Do we have other configuration variables with regexes? and what is done there?

RuRo · 2021-05-02T01:23:39Z

Thank you everyone for the feedback. I've implemented option 1 - nitpick_ignore_regex.

Regarding how should the regular expression be matched, if I understood correctly, the 3 options were

re.match - matches "glued" to the beginning of the string
re.search - matches anywhere in the string
re.fullmatch - matches the whole string exactly (beginning to end)

My personal opinion is that the behaviour of re.match is indeed not very intuitive and doesn't follow the principle of least astonishment in this case. So for me, the choice was between re.search and re.fullmatch. In the end, I've implemented re.fullmatch, but I am open to further ~~bikeshedding~~ discussion of this topic.

My motivation for choosing fullmatch instead of search is:

A plain string (with no special regex characters) would "match" the same way as in nitpick_ignore. This makes it easier to switch to nitpick_ignore_regex for all nitpicks instead of keeping 2 separate config variables.
I expect, that some users might need the "advanced" regex matching only for type or only for target. This might lead them to write a pattern like ('.*', 'target') or ('type', '.*'). In both cases re.search would unexpectedly (imo) treat the "plain" strings as .*target.* and .*type.*.
fullmatch is stricter (it matches a subset of strings, compared to search).
- If the user expected the regular expression to work like re.search, but the implementation uses re.fullmatch, the nitpicky warning won't disappear, so it would be immediately obvious to the user, that he did something wrong.
- If the user expected the regular expression to work like re.fullmatch, but the implementation uses re.search, then the nitpicky warning that he tried to silence will disappear, but the user would also have accidentally silenced all the warnings that contain the expression he used as a substring, which is not what the user intended.

Just in case, I've added a description of this behaviour to the documentation and a separate unittest for it.

jakobandersen · 2021-05-02T08:13:51Z

Good point about fullmatch vs. search, and the added documentation makes it clear.

tk0miya

LGTM with nits.

As I commented above, I prefer the re.fullmatch() approach in this case. +1 for this as updated.

tests/test_domain_c.py

sphinx/transforms/post_transforms/__init__.py

RuRo · 2021-05-05T12:04:28Z

@tk0miya have I addressed all of your comments?

tk0miya

Thank you for your work. LGTM!

tk0miya · 2021-05-05T14:43:50Z

Merged!

tk0miya added internals:config type:proposal a feature suggestion labels Apr 24, 2021

tk0miya added this to the 4.1.0 milestone Apr 24, 2021

tk0miya reviewed Apr 24, 2021

View reviewed changes

tests/test_domain_c.py Outdated Show resolved Hide resolved

RuRo changed the title ~~Allow glob patterns in nitpick_ignore~~ Allow regex patterns in nitpick_ignore Apr 26, 2021

RuRo changed the title ~~Allow regex patterns in nitpick_ignore~~ Ignore nitpick warnings with regular expressions using nitpick_ignore_regex May 2, 2021

tk0miya requested changes May 2, 2021

View reviewed changes

tests/test_domain_c.py Outdated Show resolved Hide resolved

sphinx/transforms/post_transforms/__init__.py Outdated Show resolved Hide resolved

RuRo added 5 commits May 2, 2021 13:31

implement nitpick_ignore_regex

00bc346

describe nitpick_ignore_regex in docs

104d0a3

add nitpicky-warnings test root

bf2e69a

add tests for regular nitpick_ignore

0aea6d3

add tests for nitpick_ignore_regex

afa80df

tk0miya approved these changes May 5, 2021

View reviewed changes

tk0miya merged commit d82c8a7 into sphinx-doc:4.x May 5, 2021

tk0miya added a commit to tk0miya/sphinx that referenced this pull request May 5, 2021

Update CHANGES for PR sphinx-doc#9131

2d9b9c0

tk0miya added a commit that referenced this pull request May 5, 2021

Update CHANGES for PR #9131

5127435

github-actions bot locked as resolved and limited conversation to collaborators Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore nitpick warnings with regular expressions using `nitpick_ignore_regex` #9131

Ignore nitpick warnings with regular expressions using `nitpick_ignore_regex` #9131

RuRo commented Apr 24, 2021 •

edited

tk0miya commented Apr 24, 2021

RuRo commented Apr 26, 2021

RuRo commented May 1, 2021

jakobandersen commented May 1, 2021

tk0miya commented May 1, 2021

jakobandersen commented May 1, 2021

tk0miya commented May 1, 2021

jakobandersen commented May 1, 2021

RuRo commented May 2, 2021

jakobandersen commented May 2, 2021

tk0miya left a comment

RuRo commented May 5, 2021

tk0miya left a comment

tk0miya commented May 5, 2021

Ignore nitpick warnings with regular expressions using nitpick_ignore_regex #9131

Ignore nitpick warnings with regular expressions using nitpick_ignore_regex #9131

Conversation

RuRo commented Apr 24, 2021 • edited

Feature or Bugfix

Purpose

Compatibility

Tests

tk0miya commented Apr 24, 2021

RuRo commented Apr 26, 2021

RuRo commented May 1, 2021

jakobandersen commented May 1, 2021

tk0miya commented May 1, 2021

jakobandersen commented May 1, 2021

tk0miya commented May 1, 2021

jakobandersen commented May 1, 2021

RuRo commented May 2, 2021

jakobandersen commented May 2, 2021

tk0miya left a comment

Choose a reason for hiding this comment

RuRo commented May 5, 2021

tk0miya left a comment

Choose a reason for hiding this comment

tk0miya commented May 5, 2021

Ignore nitpick warnings with regular expressions using `nitpick_ignore_regex` #9131

Ignore nitpick warnings with regular expressions using `nitpick_ignore_regex` #9131

RuRo commented Apr 24, 2021 •

edited