Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LaTeX] Text can fall out of code-block at end of page and leave artifact on next page #8686

Closed
jessetan opened this issue Jan 12, 2021 · 10 comments · Fixed by #10577
Closed

Comments

@jessetan
Copy link
Contributor

jessetan commented Jan 12, 2021

Describe the bug

When a code block is at the end of a page and the contents makes the block larger than can fit on the page, it is broken into two sections. In particular when the last line was wrapped, the content spills out of the code block. The code block on the next page remains empty.

To Reproduce

  1. Start a new Sphinx project with default settings (I used version 3.4.0) and use the attached index.txt (rename to rst because GitHub does not recognise rst extension).
  2. run make latexpdf
  3. view output

Expected behavior
The code block is split across two pages, and the contents of the code block directive is spread between the two pages.

Actual behavior
Screenshot 2021-01-12 at 16 57 36

Environment info

  • OS: macOS 11.1
  • Python version: 3.8.2
  • Sphinx version: 3.4.0
  • Sphinx extensions: none
  • Extra tools: MacTeX 2020
@jfbu
Copy link
Contributor

jfbu commented Jan 14, 2021

Thanks for reporting. Unfortunately I think this falls under the caveat from the following code comments in sphinx.sty:

% - The wrapped material will not break across pages, it is impossible
% to achieve this without extensive rewrite of fancyvrb.

I guess fix for that issue can only come from a complete replacement of Sphinx usage of fancyvrb by something else...

@jfbu
Copy link
Contributor

jfbu commented Jan 14, 2021

However perhaps I should try to dig into how to avoid the empty small height framing at top of next page. Here another old LaTeX package is at play: framed. I am not optimistic about this issue... but do ping me if it goes stale. (any advice welcome of course)

@jfbu
Copy link
Contributor

jfbu commented Jan 18, 2021

@jessetan I looked again at that issue. i checked again fancyvrb code and tt is not possible to allow pagebreaks inside wrapped lines without some rather deep hacks into it (in brief, Sphinx manages to allow linewraps by modifying some fancyvrb line handling but this happens inside some unbreakable box anyhow; to change that we have to intervene prior to being enclosed in such box, and this is not easy). This would be quite some effort because the code of fancyvrb is not documented and one has to invest the time into understanding all its details. Notice that the very same happens in pure LaTeX world with packages minted (which, very much like Sphinx, uses the Pygmentize library for syntax-highlighting the code, then hand it overt to fancyvrb for the TeX aspects of not having special characters or only those we need, of using monospace font, of numbering lines etc...). Here is latex file showing the problem:

\documentclass[a4paper]{article}
% compile with shell-escape flag
\usepackage{geometry}
\usepackage{minted}

\begin{document}
\vspace*{17.3cm}% all on page 1
\vspace*{1mm}% all goes to page 2 if uncommented here
\begin{minted}[breaklines]{python}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus aliquet sagittis sagittis. Phasellus elementum, felis sed fringilla maximus, nibh magna vestibulum erat, ac tincidunt turpis felis vel massa. Quisque vel lacus odio. Aenean velit felis, tincidunt vitae ante id, porttitor consequat magna. Nullam in venenatis nibh, in hendrerit risus. Etiam libero justo, tempor nec quam ut, ultrices pretium erat. Mauris at ullamcorper velit.
\end{minted}
\end{document}

Thinking about this, one may think that this could be moved upstream to Pygments as a feature request. Because if the line-breaking is created there, this will fix it. But on second thoughts, no this can't work. Because Pygments will have no idea what the target line-width is supposed to be. So we would have, if the feature is implemented at Pygments, to pass it that information. But this is in itself is not obvious because it will depend on how deeply nested the code-block directive was located. This seems to add great difficulties on Sphinx side as well.

I am afraid only way out is either

  • deep hack into fancyvrb,
  • find some pre-existing LaTeX package for verbatim printing which would allow solving this,
  • or invest the time to understand why exactly we use fancyvrb and if it is not possible to actually write some custom latex package with exactly what Sphinx needs. Doing this might break user projects who customized SphinxVerbatim.

As per the problem of an empty box on top of continuation page, this is but a small aesthetic problem compared to the tragic overflow occuring at bottom of page it a very long line in source got wrapped to multi-line in output. Besides it is probably time Sphinx drops usage of framed and starts using tcolorbox and perhaps with tcolorbox it will be easier to make sure multi-line unbreakable material moves to next page rather than overflow bottom of page.

This is borderline to wont-fix :-(. I am afraid if I engage into hacking too much into fancyvrb that this will create various unexpected problems. The more reasonable would be if I rather invest time into the tcolorbox thing, so that even if can't fix all, at least Sphinx will then provide very nice highly customizable boxes in pdf output.

@jfbu
Copy link
Contributor

jfbu commented Jan 18, 2021

This is borderline to wont-fix :-(. I am afraid if I engage into hacking too much into fancyvrb that this will create various unexpected problems. The more reasonable would be if I rather invest time into the tcolorbox thing,

hmm. Actually looks like transitioning to tcolorbox would cost at least as much effort/time as hacking fancyvrb. Alas...

@jfbu
Copy link
Contributor

jfbu commented Jan 31, 2021

I have been thinking about this issue. Currently Sphinx renders literal blocks using

  • package fancyvrb
  • package framed
  • additional code

This is legacy situation, and the additional code is for, among others:

I see currently two main issues with this:

The present issue is intrinsic to package fancyvrb there is no way around it without deeply modifying its process. The second issue is due to usage of framed and intrinsic limitations on size of TeX boxes. It may be lifted a bit by tcolorbox (#3790) but iirc not solved completely.

The best solution, I believe is to drop entirely usage of fancyvrb and framed (for literal blocks only; we will keep it for topic and contents boxes and for warning type admonitions). A Verbatim-enhancer like fancyvrb is not a difficult LaTeX task (I mean, it is well in my reach). The framed package on the other hand requires expertise at LaTeX team level, but fortunately, in the case of literal code-blocks with Pygments mark-up rendering is much simpler as we only have to handle a pile of lines (with a little enhancement to wrap long code lines) and there are no lists, no tables, no footnotes inside the contents. I am sure I can code this and that this will solve the size limitation problem because I won't have to gather all contents before shipping it out to the latex page builder.

And it will be possible I imagine on this occasion to add an inner hook wrapper for people who will want to do the framing via tcolorbox (which has too many options for me to look at seriously enough to provide a digested front-end for non-LaTeX people, which is 99% of Sphinx users).

Thus this is my intention, perhaps in time for 4.0 release.

@jfbu jfbu added this to the 4.0.0 milestone Jan 31, 2021
@sebastien-riou
Copy link

@jfbu
Another test case, not handled correctly currently: long hex strings
Screenshot from 2021-02-07 12-37-17

code:

DryGASCON128k56:

.. code-block:: shell

   $ python3 -m drysponge.drygascon128_aead e 000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F3031323334353637 000102030405060708090A0B0C0D0E0F "" ""
   28830FE67DE9772201D254ABE4C9788D

link to rst file: examples_cli.rst

@jfbu
Copy link
Contributor

jfbu commented Feb 7, 2021

@sebastien-riou Thanks for example. This is another issue, unrelated. I thought we had a ticket already for this, but I can't find it. Can you please open a ticket? I will comment over there.

@jfbu
Copy link
Contributor

jfbu commented Feb 12, 2021

I have made up my mind on this issue, and the fix is that we drop fancyvrb usage in future. This will also be occasion to unify more code-blocks and parsed-literals. However I have to write up the needed latex package.

@tk0miya tk0miya modified the milestones: 4.0.0, 4.1.0 Apr 17, 2021
@tk0miya tk0miya removed this from the 4.1.0 milestone Jul 10, 2021
jfbu added a commit to jfbu/sphinx that referenced this issue Jun 18, 2022
@jfbu
Copy link
Contributor

jfbu commented Jun 18, 2022

@jessetan can you give a look at #10577. Do you have opinion about what I should do with line number?

jfbu added a commit to jfbu/sphinx that referenced this issue Jun 18, 2022
jfbu added a commit to jfbu/sphinx that referenced this issue Jun 19, 2022
…en wrapping

This maintains existing behavior.
@jfbu
Copy link
Contributor

jfbu commented Jun 19, 2022

I guess fix for that issue can only come from a complete replacement of Sphinx usage of fancyvrb by something else...

I have finally solved this issue with limited extra code, but I needed to take deep breath and plunge into fancyvrb.sty internals to understand them fully. At long last I am relieved of this... The difficulty in all of this is also to no break possible customized usage of fancyvrb by Sphinx users.

In truth, rendering of Pygmentized code-blocks does not need "Verbatim" like approach, because Pygmentize has already escaped all special characters, and mainly all we need to do is to ensure suitable font is used and spaces are obeyed, but it is legacy situation that Pygmentize library is configured to produce its ouput in a "Verbatim" environment (allowing LaTeX macros...), by default using fancyvrb, and at this stage the effort of removing all this and start afresh would be too great if we were to also try to support options people may have used via the 'fvset' key of latex_elements. The Sphinx addition of support of line emphasizing had also been done in a way maximally compatible with original fancyvrb, as had been the framing of code-blocks in a way allowing page breaks and adding continuation hints.

jfbu added a commit that referenced this issue Jun 26, 2022
When wrapping long code lines, recover the TeX "hbox"es and trick fancyvrb into considering each as an input code line.  This way, pagebreaks are allowed.  No change to existing output (in particular, codeline number is printed only once) when the wrapped line had place on current page.
jfbu added a commit to jfbu/sphinx that referenced this issue Jun 29, 2022
This does not fix entirely sphinx-doc#10610 but it does sufficiently for it not to
require reverting sphinx-doc#10577 which tried to solve sphinx-doc#8686 conundrum.  In
extreme cases, the sphinx-doc#8686 problem meant that some contents disappearing
at page bottom, so it is probably better that to maintain sphinx-doc#10577 which
will avoid anysuch overflow of code beyond its frame, even though in
some specific cases (a colored entity such as a long string is partly on
both pages), some syntax highlighting gets lost.

There are anyhow other issues with colors for wrapped code lines, even
with no pagebreaks involved, such as sphinx-doc#10615.  This patch does not change
the situation there.
jfbu added a commit to jfbu/sphinx that referenced this issue Jun 29, 2022
This does not fix entirely sphinx-doc#10610 but it does sufficiently for it not to
require reverting sphinx-doc#10577 which tried to solve sphinx-doc#8686 conundrum.  In
extreme cases, the sphinx-doc#8686 problem meant that some contents disappeared
at page bottom, so it is probably better to maintain sphinx-doc#10577 which
will avoid any such overflow of code beyond its frame, even though in
some specific cases (a colored entity such as a long string is partly on
both pages), some syntax highlighting gets lost.

There are anyhow other issues with colors for wrapped code lines, even
with no pagebreaks involved, such as sphinx-doc#10615.  This patch does not change
the situation there.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants