Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Bug in Pygments macro definition that prevents PDF compile #2172

Closed
etpalmer63 opened this issue Jul 1, 2022 · 10 comments
Closed

Comments

@etpalmer63
Copy link

I encountered this bug when I was trying to render a Sphinx document as a pdf. For syntax highlighting that uses Pygments, the pygment macro causes an error when its included in figure text. The Sphinx team produced this snippet that recreates the error (after the rst has been converted to latex):

import os
from pathlib import Path
import shutil
import subprocess

import pygments.formatters

def write(filename, text):
    Path(filename).parent.mkdir(parents=True, exist_ok=True)
    Path(filename).write_text(text, encoding="utf-8")


shutil.rmtree("_build", ignore_errors=True)

os.mkdir("_build")
shutil.copy("dummy_image.png", "_build/dummy_image.png")

write("_build/pygmentshighlight.tex",
      pygments.formatters.LatexFormatter(commandprefix="PYG").get_style_defs())


write("_build/test_pyg.tex", r'''
\documentclass{report}
\usepackage{graphicx}
\include{pygmentshighlight}

\begin{document}
\begin{figure}
\includegraphics[width=\textwidth]{dummy_image}
\caption{
\texttt{
\PYG{n}{spam} \PYG{o}{=} \PYG{l+m+mi}{1}%
}
}
\end{figure}
\end{document}
''')

os.chdir("_build")
subprocess.call(("latexmk", "-pdf", "-dvi-", "-ps-", "test_durole.tex"))
subprocess.call(("latexmk", "-pdf", "-dvi-", "-ps-", "test_pyg.tex"))

For more information, and better insight from the Sphinx team themselves, here is the original issue: sphinx-doc/sphinx#10506

Please let me know if any additional information would be helpful.

Thanks for checking this out,

Erik

@birkenfeld
Copy link
Member

cc @jfbu

@jfbu
Copy link
Contributor

jfbu commented Jul 1, 2022

@birkenfeld The problem (cf sphinx-doc/sphinx#10506 (comment)) is that the \PY macro cannot be used in so-called "LaTeX moving arguments". Such situation became possible recently in Sphinx with activation of syntax highlighting via Pygments for inline code and not only as formerly for display level "code-blocks".

We fixed it at Sphinx by redoing the \PY definition with \protected\def rather than \def (cf sphinx-doc/sphinx@a7ef63a). The \protected is part of e-TeX (1999) which started I think be used by LaTeX when available around 2003-2005. According to TeXLive doc, year 2004:

pdfetex is now the default engine for all formats except (plain) tex itself.

If you want to make a change upstream here at Pygments the diff would be:

diff --git a/pygments/formatters/latex.py b/pygments/formatters/latex.py
index d33b686d..bc6ea96a 100644
--- a/pygments/formatters/latex.py
+++ b/pygments/formatters/latex.py
@@ -105,7 +105,7 @@ STYLE_TEMPLATE = r'''
     \%(cp)s@tok{#1}\expandafter\%(cp)s@toks\fi}
 \def\%(cp)s@do#1{\%(cp)s@bc{\%(cp)s@tc{\%(cp)s@ul{%%
     \%(cp)s@it{\%(cp)s@bf{\%(cp)s@ff{#1}}}}}}}
-\def\%(cp)s#1#2{\%(cp)s@reset\%(cp)s@toks#1+\relax+\%(cp)s@do{#2}}
+\protected\def\%(cp)s#1#2{\%(cp)s@reset\%(cp)s@toks#1+\relax+\%(cp)s@do{#2}}
 
 %(styles)s
 

On looking at this tonight I noticed related other potential issues of a subtle sort, with things such as \def\PYGZbs{\char`\\} (with PYG command prefix). Initially I wanted to check what happened with \PYG in a section title, but at Sphinx, literals in titles are not Pygmentized in the LaTeX build (they are for HTML). Now if a user inserts some inline code in a figure caption in place of a a sectioning title, AND if this Sphinx user adds a \listoffigures in addition to the \sphinxtableofcontents then a build failure arises.

The reason is that as \PYGZbss written to external file expanded, i.e. replaced by \char`\\ (as \\ is itself \protected with (recent) LaTeX, it causes no problem at this stage). Then at \listoffigures the file is read, but Sphinx has also added an overall \sphinxcode wrapper. (where can I find a double-danger sign in GitHub interface to warn reader this is becoming hairy). And this \sphinxcode does some \scantokens after having rendered some things "active" to prevent ligatures etc... as done in LaTeX verbatim. Now the lefttick is "active" and this then causes \char to throw the dreaded Missing number, treated as zero.

Nota bene: this problem has not YET been reported at Sphinx. I discovered it theoretically now...

I am not sure if you will want to consider this also a problem at Pygments level.

ping @gpoore, @muzimuzhi is this relevant to "minted inline" usage?

I initially thought about using \protected here too, but with it we will still have problems with hyperref, once Pygments is used for inline code in section titles (which it is NOT at this stage by Sphinx). (I am skipping problems such as the fact that chapter titles may get uppercased... at least in the page headers... which is very bad with code...).

Package hyperref Warning: Token not allowed in a PDF string (Unicode):
(hyperref)                removing `\PYG' on input line 91.


Package hyperref Warning: Token not allowed in a PDF string (Unicode):
(hyperref)                removing `\PYGZdq' on input line 91.

(in the above manual test the macros were \protected)

For \PYG the standard fix would be to use either \pdfstringdefDisableCommands usage (see hyperref documentation or \texorpdfstring. Perhaps for the former

\pdfstringdefDisableCommands{\def\PYG#1#2{#2}}

Problem of this is that this is conditional on hyperref being loaded, and where are we going to execute it?

For the \PYGZdq and friends one thinks of using the LaTeX escaping rather than the core TeX \char which is not allowed to go into PDF bookmarks. Unfortunately this brings some issues with LaTeX font encoding. The \textquotedbl mark-up breaks documents using only OT1 font-encoding. What a mess. (for some LaTeX control sequences cf the file https://github.com/sphinx-doc/sphinx/blob/5.x/sphinx/util/texescape.py)

So perhaps in the end

  • adding \protected also to all those \PGZbs, \PYGZdq etc...
  • and provide suitable substitute definitions in the style \pdfstringdefDisableCommands{\def\PYGZdq{"}}

is the way to go. But very annoying to have to go into \pdfstringdefDisableCommands which exists only with hyperref...

Well, I was verbose as usual (and spend actually quite some time on this now) and as explained above some of these problems are only virtual ones at Sphinx.

But perhaps they exist already for https://github.com/gpoore/minted ? Nothing forbids a minted user to try to use it in titles? (I did not check documentation).

@jeanas
Copy link
Contributor

jeanas commented Jul 2, 2022

What a mess.

Just a tiny little bit 😄

@jfbu
Copy link
Contributor

jfbu commented Jul 2, 2022

Apologies for my longish earlier comment. I should have paused and take time to investigate rather than doing some "on the fly live reporting" (which took quite some time on my side...).

  • regarding https://github.com/gpoore/minted, their \mintinline is a \protected macro and there is thus no real issue in sectioning titles and captions; this is very different from Sphinx where the Pygmentize mark-up is there already in the produced .tex file. With https://github.com/gpoore/minted, the Pygmentize mark-up is in some auxiliary file, which gets input. There is only a minor issue with \mintinline usage in sectioning titles and document has \tableofcontents and uses hyperref .
Package hyperref Warning: Token not allowed in a PDF string (Unicode):
(hyperref)                removing `\mintinline' on input line 9.

This is unrelated to whatever happens here.

  • as per Sphinx we can add ourselves the \protected to the things such as \PYGZbs. Without it, we have the complicated issue which I have described in my longish earlier comment which originates in \sphinxupquote wrapper and its \scantokens. The hyperref related things are currently not of relevance to Sphinx, as it inhibits completely in LaTeX output the inclusion of Pygments mark-up in sectioning titles. If and when this inhibition is lifted, Sphinx will handle the collaterals.

To sum up and try to dispel the discomfort that my too long and too TeXnical comment may have created:

It would probably be better if Pygments defined \PYG (if command prefix is "PYG") as well as things such as \PYGZbs to be \protected, else these macros may cause trouble inside captions or section titles.

However https://github.com/gpoore/minted is a priori not exposed to the reported issue, and Sphinx can, and has already started to handle addition of \protected, and will consider additional measures

Thus, there is probably nothing to do here. Except if someone reports an issue from direct usage of the Pygmentize library.

@gpoore
Copy link

gpoore commented Jul 2, 2022

@jfbu Your summary of \mintinline is correct. It generally works inside other commands except in a few edge cases involving things like # and %, but it doesn't work correctly for hyperref bookmarks. If you ever need to consider the bookmarks case, you might be interested in the reimplemented \Verb from https://github.com/gpoore/fvextra. This functions inside other commands (except for a few edge cases, which can be handled by \EscVerb) and generally works with hyperref bookmarks. I expect that a future version of \mintinline can probably build on \Verb to get bookmarks working in at least most cases.

jfbu added a commit to jfbu/sphinx that referenced this issue Jul 2, 2022
@jfbu
Copy link
Contributor

jfbu commented Jul 2, 2022

@gpoore thanks for info! At Sphinx I have pushed replacements to the Pygments escapes at sphinx-doc/sphinx@213c29b. These replacements are math mode and bookmarks compatible.

@jfbu
Copy link
Contributor

jfbu commented Jul 2, 2022

@birkenfeld As I was anlyzing this "live" my opinion evolved. I don't recommend now to add \protected to the Pygments escape, except for the master \PYG (if command prefix is PYG). Indeed making a macro \protected also means it has to be let known to hyperref for bookmarks in a way or another (else hyperref will remove it with a warning). At sphinx-doc/sphinx@213c29b I have done definitions without \protected in terms of things bookmarks via hyperref will accept correctly.

@jeanas
Copy link
Contributor

jeanas commented Jul 2, 2022

OK, so you recommend to do nothing on the Pygments side, right?

@jfbu
Copy link
Contributor

jfbu commented Jul 2, 2022

OK, so you recommend to do nothing on the Pygments side, right?

Well, I am looking at this from Sphinx, where the meaning of the LaTeX macros involved in the Pygments LaTeX Formatter output can be modified, so there is no urgency for us. The issue arose at Sphinx 5.0.0 as it has allowed Pygmentize for inline code snippets also for LaTeX output (removing the \begin{Verbatim}...\end{Verbatim} encapsulation) and this exposes that the Pygments macros are "fragile" and can not be used in argument of \caption{} or \section{...} LaTeX commands.

I feel I have a pretty good set of alternative definitions at sphinx-doc/sphinx@213c29b, with a slight problem with the " still (due to maintaining compatibility with the old TeX font encoding OT1 despite the fact that Sphinx for many years has T1 as default).

Alternative definitions such as the ones in my Sphinx patch are necessary if the Pygments LaTeX escapes for special characters are one day to be used in sectioning titles and hyperref attempts to create bookmarks entries out of them. The compatibility with hyperref is a bit annoying because it means special approach \texorpdfstring or \pdfstringdefDisableCommands, I don't see very well how to add that to Pygments itself without complications related to testing if hyperref is used or not used in user LaTeX document. In my redefinitions I was relieved that I could find things which hyperref knows how to handle out-of-the-box. So only a special step is needed for the main \PY (or \PYG at Sphinx) macro. And there were already a few "\protected" macros which we let know to hyperref. And hyperref is a requirement for Sphinx LaTeX.

In the absence of Pygments users complaining about the current state of affairs I can not push for you to backport my changes... as explained by @etpalmer63 , the issue arose from a Sphinx context, although a minimal example was provided which did not use it, but notice that the \PYG there are manually created, Pygments is only used to produce its LaTeX style sheet. If this stylesheet was a package contributed to CTAN, then yes, I would push for changes like those I have done at Sphinx.

As this is not the case and as I don't have necessarily the knowledge of how Pygments LaTeX output is used "in the wild" I can only remain cautious and not insist on things changing here.

@birkenfeld
Copy link
Member

Thanks for your very thorough analysis! Closing for now - can be reopened if the problem appears elsewhere.

@birkenfeld birkenfeld closed this as not planned Won't fix, can't repro, duplicate, stale Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants