New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LaTex] code-block printed out of margin #8849
Comments
Thanks for report. The mechanism for wrapping long code lines does not work here. This mechanism uses distinct techniques:
For digits 0123456789 and letters ABCDEF, although it is possible in small TeX files to imitate what is done in the last two items, in real life this is simply a no-go. (for example digits 0 to 9 appear in color specifications in the Pygments mark-up; if we start making them behave specially we immediately break the So, the only solution I see is for the Pygments library itself to identify hex-strings and add LaTeX dummy mark-up for each digit and letter within it. Then we can achieve the desired result. Something like having
Short of that, I currently see no way. In a pure LaTeX way where you write the document manually it is enough to have a single macro I suggest you open a ticket at their repo: https://github.com/pygments/pygments linking to this issue. |
Thanks for your reactivity. I think that the problem is more general than hex strings. Potentially a code block may contain arbitrary strings, some very long. For example a base64 encoded message. |
Yes you are right that the problem is more general. Also I suspect it is not probable we will get from Pygments the feature; here the "shell" lexer is used, and what is then a "string" for it? not obvious. You don't want to cut in middle of some shell command, if a great many of them are "simply" breaking the line could work but it will not be simple! To give some context here is how your input is converted:
You see the mark-up macros. They aren't too numerous in this case. Counting character is not obvious due to their presence, and we don't want to cut right inside their names. On the other hand because we tell TeX that backslash and For standard code lines which are long not because of "strings", the current approach which lets TeX itself via its paragraph building algorithm do the line breaks works fine in general. There is #8686. If we apply the "simply pre cut line" approach, this will not escape completely #8686 because as I explained above anyhow perhaps some synatx highligting macro will fetch the whole thing, and it will be rendered in one "horizontal box", only with twice the normal vertical height, it will not be two stacked horizontal boxes, it can't allow pagebreak. Currently the main issue I see is that we don't know in advance what will be the linewidth; the user can change the font so we can not know for sure the character width (it can be determined dynamically but I am talking here about the Python side of things; I am focusing here on some Python parsing, because doing it entirely on LaTeX side could be feasible but will be complex). Code-blocks can appear in indented context, even in table cells, in narrow columns. The Sphinx latex builder will not have the means to know in advance how many characters make a line in output. Even if we knew it, say that the target width is 66 characters, it will require a bit of work to identify in output of Pygments inclusive of its latex mark-up where to legitimately break. Ideally we should also count how deeply we are inside braces at cut location. We should add closing braces as many are needed, then re-inserts the nested formatting macros at start of second part of what was split. If we do that then the successive partial parts will each occupy a "horizontal box" of its own, and pagebreaks will work. Basically we need a parser of Pygments LaTeX output... something could be done. The most satisfying result would be to let this parsing be done by LaTeX itself. No easy task. But only way to adapt well to linewidth. But whether in Python or LaTeX, it will be very difficult to not break in middle of a keyword. How to distinguish things we can't break and things we can? The current approach allows breaks only at some punctuation characters and other special character and we can tell TeX to preferentiably break before or after, up to the cost of some space left at end of line. edit: I see a way to instruct the latex code that the TeX native process could not find a good break point; then it would be possible go to the "cut at any cost approach"; this could try the ambitious method of counting braces and adding them as well as the syntax highlighting macros at suitable point, or perhaps to work with some TeX box manipulation. I will think about it. The problem you raise is to allow more breakpoints.This is not feasible via handing over the whole stuff to TeX paragraph builder because it is simply impossible to let digits and A..F become "active", only way would be a pre-analysis of the line. So the breaking must be done either at Python side, or at LaTeX side either via some pre-parsing or via some multi-pass approach (latex does not have a ideally I wish I could transfer the hard work to Pygments: Pygments's lexers can know unbreakable keywords and they could arrange so that when one breaks somewhere it isnot in middle of a keyword, and that one closes all nested formatting to restart them all on next line. User will tell Pygments: my target width is 80characters. (I have edited to be a bit clear; trying also to be less verbose. It is hard because verbosity is my usual way to prepare to solve the problem... eventually, perhaps. It is true that #8686 has quite some relation to this because my latest thoughts on how to solve #8686 would be to write myself all the necessary latex code rather than being dependent on |
Turns out I may have a working solution. It will require some testing. See #8854. |
@jfbu
this is not handled correctly currently: long hex strings
code:
link to rst file: examples_cli.rst
Originally posted by @sebastien-riou in #8686 (comment)
The text was updated successfully, but these errors were encountered: