Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[console] enhance detection and elimination of ANSI sequences #12216

Merged
merged 12 commits into from Apr 4, 2024

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Mar 31, 2024

While I was implementing the line matcher object for testing Sphinx's output, I needed some 'cleaning' step based on colors. So I used sphinx.util.console.blue for instance; however since the function is dynamically created, it is not picked up by mypy.

So, I added the dynamic definitions of the functions in a TYPE_CHECKING block. Now, I also needed some better regex for stripping colors because if we were to release that line matcher, then we don't really know whether the output could also contain other control sequences (that are currently not picked up by our regex).

In addition, we never had any test for ANSI esc. seq. so I added some. I also think that people would be interested in cleaning up other ANSI esc. seq. that could be there (not just those related to colors) but maybe they want to only remove colors and keep non-colors ones (e.g., those supported by vt100 terminals or xterms). I know the list is not complete (per https://en.wikipedia.org/wiki/ANSI_escape_code) but I don't know if I should add more (it is very important that the regex is not too generic otherwise you end up removing possible legit text).

Remarks

  • I explicitly disabled the ruff formatter for some portion of the code. The rationale behind is that I clearly don't want to have empty lines due to definitions inside TYPE_CHECKING blocks and I wanted to group them by 'action' (with the formatter, every function would have the same number amounts of empty lines inbetween and I didn't want to increase the size of the file's top section).
  • Thanks to those new definitions, we can also remove some # type: ignore from mypy (which is a good thing).
  • This PR should be merged before the line matcher feature since the latter is based on that one.

@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 31, 2024

So, I added the dynamic definitions of the functions in a TYPE_CHECKING block.

this is definitely good thanks, perhaps though move this to a separate PR, because it feels like you are trying to do multiple things at once here?

Now, I also needed some better regex for stripping colors

Related to #12134, I feel this is somewhat treating the symptom and not the cause; what people actually want to do, is just to run the test without colored output in the first place, not retroactively have complex regexes to remove them

@picnixz

This comment has been minimized.

@chrisjsewell

This comment was marked as resolved.

@chrisjsewell
Copy link
Member

but since I cannot know which ones can be part of Sphinx output (if you are considering extensions)

I'm a bit confused by what you mean here, extensions use exactly the same mechanism for adding color as sphinx (at least they should be), e.g. from sphinx.util.console import red, ...
If you turn off color logging for sphinx, you turn off color logging for extensions

@picnixz

This comment has been minimized.

@picnixz

This comment was marked as resolved.

@picnixz
Copy link
Member Author

picnixz commented Apr 2, 2024

OK, the regex actually had issues. I think I'll go with what Chris suggested, namely, I will keep the current behaviour and only add tests.

By the way, I'm not sure if people want strip_colors or strip_escape_sequences. In general, if you have escape sequences, you just want to get rid of all of them, including others such as \x1b[2K.

Also, I observed that our function does not work if you have something like:

'a' + blue('b') + '\x1b[2K' + 'c' + blue('d')

The strip_colors ends up removing '\x1b[2Kc' all together so... I think we should rather fix the current regex.

Copy link
Member

@chrisjsewell chrisjsewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good thanks

@picnixz
Copy link
Member Author

picnixz commented Apr 2, 2024

So, what should I do with the bad strip_colors? should I just write a comment for the doc or.. ? (stripping everything appears to be fine but the steip_colors function is a bit.. wrong).

@chrisjsewell
Copy link
Member

So, what should I do with the bad strip_colors? should I just write a comment for the doc or.. ? (stripping everything appears to be fine but the steip_colors function is a bit.. wrong).

yeh I think that sounds a good idea 👍

Could maybe mention that its use case is more envisaged for testing purposes (e.g. for checking warning strings), rather than any "production" code. Or at least that's my thinking on it

@picnixz
Copy link
Member Author

picnixz commented Apr 2, 2024

Could maybe mention that its use case is more envisaged for testing purposes (e.g. for checking warning strings), rather than any "production" code. Or at least that's my thinking on it

My best suggestion is to move it out of sphinx.util.console. In production, we want to remove every ANSI codes (#11624). I'm not sure if we actually use strip_colors. And if we do so, I think we should just remove strip_colors entirely and just expose the strip_escape_sequences (possibly renaming it strip_ansi?). I actually don't know why we have two stripper functions...

The thing is... we are using strip_colors in tests (and I deprecated the one in sphinx.testing.util in #12186 because you might not need the Sphinx plugin (IIRC, sphinx.testing can be selected not to be installed)) but I don't think we are using it anywhere else...

TL;DR: I suggest having a single stripper function that removes both ANSI colors and erase in line.

@picnixz
Copy link
Member Author

picnixz commented Apr 3, 2024

@chrisjsewell I know you accepted the PR, but here is my final solution (together with the caution comment).

I think the tests capture most of the non-pathological cases (and I easily fixed the previous bug). By the way, to avoid namespace pollution, I renamed some module loop invariants that could shadow names if incorrectly imported.

sphinx/util/console.py Outdated Show resolved Hide resolved
@chrisjsewell chrisjsewell self-requested a review April 4, 2024 10:05
Copy link
Member

@chrisjsewell chrisjsewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh I think still all good thanks.

I guess making strip_escape_sequences public is a thinking point,
but it looks like you have created a good range of tests for it, so that should be fine 👍

@picnixz
Copy link
Member Author

picnixz commented Apr 4, 2024

I guess making strip_escape_sequences public is a thinking point,

That's why I also marked the feature as experimental (at least, if it doesn't work I can hide behind that justification :))

@picnixz
Copy link
Member Author

picnixz commented Apr 4, 2024

I've just added a missing coverage step (just to check that we indeed ignore the unknown codes correctly). I'll merge this one when CI passes.

@picnixz picnixz merged commit f7a1397 into sphinx-doc:master Apr 4, 2024
22 checks passed
@picnixz picnixz deleted the fix/ansi-functions branch April 4, 2024 10:35
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants