Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I believe URL_REGEX is missing a | #269

Open
EFord36 opened this issue Oct 30, 2023 · 1 comment
Open

I believe URL_REGEX is missing a | #269

EFord36 opened this issue Oct 30, 2023 · 1 comment
Labels
C: style Relates to docstring format style (e.g., Google, NumPy, Sphinx) P: enhancement Feature that is outside the scope of PEP 257 U: low A relatively low urgency issue

Comments

@EFord36
Copy link
Contributor

EFord36 commented Oct 30, 2023

comparing the 'URL_REGEX explanation comment':

# (__ |`{{2}}|`\w[\w. :\n]*|\.\. _?[\w. :]+|')? is used to find in-line links that

to the actual first line of the regex:

    rf"(__ |`{{2}}|`\w[\w :#\n]*[.|\.\. _?[\w. :]+|')?<?"

the actual regex seems to be 'missing' a | between:

`\w[\w :#\n]*

and

[.|\.\. _?[\w. :]+

This seems to have been removed in #213 , which doesn't seem intentional to me, as it doesn't seem obviously related to the change that PR was trying to fix (although it's possible I'm misunderstanding here, sorry if so!).

I think this does affect the wrapping behaviour - links in the format starting with .. seem to not get wrapped (although I'm not 100% sure I've understand the expected behaviour). A file with these contents:

def some_func():
    """This is a func.

    A description line.

    This is a very very very very very very very very very very very long line .. _a link reference:  https://domain.invalid/
    """
    pass

def another_func():
    """This is a func.

    A description line.

    .. _a link reference that is very very very very very very very very very very very very very ver longy:  https://domain.invalid/
    """
    pass

def some_func_other_link():
    """This is another func.

    This is a very very very very very very very very very very very long line `A link reference <https://domain.invalid/>`_
    """
    pass

produces this output when docformatter is run over it, only reformatting the link with a different format:

--- before/test_url_regex.py
+++ after/test_url_regex.py
@@ -19,7 +19,9 @@
 def some_func_other_link():
     """This is another func.

-    This is a very very very very very very very very very very very long line `A link reference <https://domain.invalid/>`_
+    This is a very very very very very very very very very very very
+    long line
+    `A link reference <https://domain.invalid/>`_
     """
     pass

I think maybe this isn't obvious/impactful because this kind of link is 'explicit markup' that according to ReST can only start a line of ReST, so it's probably infrequent that you get a link and reference long enough that this would reformat it anyway.

@github-actions github-actions bot added the fresh This is a new issue label Oct 30, 2023
@EFord36
Copy link
Contributor Author

EFord36 commented Oct 30, 2023

By the way, I noticed this because I was looking at the big comment explaining URL_REGEX, and though "maybe this is a good use case for re.verbose", which I think is a very useful but seemingly not especially well known feature of the re module that helps document complex regular expressions - I thought maybe it would help stop the comment explaining the different bits of the regex from falling out of sync with the actual regex - only to find this issue, where this seems to have actually happened.

If (after this issue is resolved) I submitted a PR to try writing a 'verbose' version of this regex with explanation 'inline', is this something you would be interested in?
could probably be 'compiled' with re.compile to prevent re-parsing of the

@weibullguy weibullguy added P: enhancement Feature that is outside the scope of PEP 257 C: style Relates to docstring format style (e.g., Google, NumPy, Sphinx) and removed fresh This is a new issue labels Nov 21, 2023
@github-actions github-actions bot added the U: low A relatively low urgency issue label Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: style Relates to docstring format style (e.g., Google, NumPy, Sphinx) P: enhancement Feature that is outside the scope of PEP 257 U: low A relatively low urgency issue
Projects
None yet
Development

No branches or pull requests

2 participants