Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markup/goldmark: Enable pass-through of raw content blocks #10894

Closed
janhuenermann opened this issue Apr 12, 2023 · 44 comments · Fixed by #11866
Closed

markup/goldmark: Enable pass-through of raw content blocks #10894

janhuenermann opened this issue Apr 12, 2023 · 44 comments · Fixed by #11866

Comments

@janhuenermann
Copy link

janhuenermann commented Apr 12, 2023

Continuing the discussion from #10888 and the forum.

Hey everyone,

Following up on the PR, I wanted to open an issue to discuss an improvement to LaTeX support in Hugo. Standard LaTeX in Markdown support has been requested by the community for a while, especially among the scientific blogging community. However, Hugo only supports a special syntax that requires modifying existing Markdown files to work with Hugo. Platforms like GitHub already support the standard dollar-sign syntax, e.g. that's an inline expression: $x^2$.

Here's the proposal:

  • add a Goldmark extension to Hugo that finds all inline and block LaTeX in Markdown file and emit them as plain text. This can be enabled using a config flag and is disabled by default.
  • With this extension, users can then include the Katex client side scripts and stylesheets, enabling them to render LaTeX without modifying their existing Markdown files.
  • This proposal doesn't require any additional dependencies and doesn't involve server side rendering of the LaTeX expressions (which needs a JavaScript engine and is potentially slow)

This is already implemented here: janhuenermann@ad38246

Looking forward to hearing your thoughts on this!

@bep
Copy link
Member

bep commented Apr 12, 2023

LaTeX in Markdown file

What is "LaTeX in Markdown"? If this is about adding yet another delimiter for this particular syntax, why not use a shortcode (for inline latex) and code fences for block latex?

@janhuenermann
Copy link
Author

LaTeX in Markdown file

What is "LaTeX in Markdown"? If this is about adding yet another delimiter for this particular syntax, why not use a shortcode (for inline latex) and code fences for block latex?

Hey @bep,

Thanks for your reply! When I mentioned "LaTeX in Markdown," I was referring to the ability to include standard LaTeX expressions directly within Markdown files using the familiar dollar-sign syntax (single $ for inline and double $$ for block expressions). This is the standard syntax in Jekyll, Jupyter notebooks, VSCode Markdown extension, on platforms like GitHub, and more, making it a widely recognized and expected feature for many users. Shortcodes however require to change Markdown files for Hugo, which is not only inconvenient but prevents users from reusing the same Markdown files across projects.

There's an in-depth discussion about this topic in the forum as well: https://discourse.gohugo.io/t/katex-in-hugo/43274

@jmooring
Copy link
Member

jmooring commented Apr 13, 2023

Miscellaneous thoughts...

Display mode (blocks)

This syntax is supported by GitHub:

```math
\sqrt{3}  # with or without delimiters
```

This syntax is supported by GitLab:

```math
\sqrt{3}  # must not include delimeters
```

The fenced code block syntax is consistent with Mermaid diagram support by both services:

```mermaid
xxx
```

Alternate delimiters

KaTeX and MathJax support alternate delimiters:

\\[ \\] # display mode
\\( \\) # inline

These are not supported by either GitHub or GitLab. Markdown in the wild using this syntax is not portable to either service.

Inline mode

With Hugo you must use a shortcode. The resulting markdown is not portable.

Minor advantage of fenced code blocks and shortcodes

JS and CSS can be loaded as needed (per page) without requiring a front matter flag (i.e., .HasShortcode, .Page.Store).

Alternative (opinion)

For diagrams, I prefer to use the free Kroki service because it embeds an SVG---no client side rendering.

For math, I prefer to use the free Math API service because it embeds an SVG---no client side rendering. The fenced code block syntax is portable; the shortcode syntax is not portable.

@bep bep modified the milestones: v0.112.0, v0.113.0 Apr 15, 2023
@bep bep modified the milestones: v0.113.0, v0.114.0, v0.115.0 Jun 8, 2023
@dudung
Copy link

dudung commented Jun 25, 2023

I am sorry, if I do not understand the proposal. Does Hugo already support for $ for inline math and $$ for block math using KateX? I believe I have used it for a while and it is already very convenient. The missing is only refering the equation feature, but it is unsupported by KaTeX and not Hugo.

@bep bep modified the milestones: v0.115.0, v0.116.0 Jun 30, 2023
@sigeryang
Copy link

sigeryang commented Jul 7, 2023

I am sorry, if I do not understand the proposal. Does Hugo already support for $ for inline math and $$ for block math using KateX?

Hugo itself does not recognize math stuff delimited by $ and $$. More specifically, it does not register $ $$ AST nodes with the underlying markdown parser (e.g. goldmark), like what goldmark-mathjax and #7435 are doing.

I believe I have used it for a while and it is already very convenient. The missing is only refering the equation feature, but it is unsupported by KaTeX and not Hugo.

Themes shipped with MathJax support (KaTeX or something else) will probably display $ $$ math correctly.

But as a result of what I said before, taking $a^*=x-b^*$ for example:

  • should be rendered as $a^\star=x-b^\star$ if $a^*=x-b^*$ appears in raw HTML and is captured by JavaScript math engines
  • but Hugo produces $a^<em>=x-b^</em>$ and math engines just won't capture, you should've guessed how it comes

Also brings similar problems mentioned in #6694 #6864 #7249. These issues will be solved cleanly once $ $$ blocks are recognized by Hugo.

@sigeryang
Copy link

Here is a brief survey on the status quo of different math block support among popular Markdown editors:

Editor / Platform ```math display $...$ inline $$...$$ display \( inline \[ display
GitHub Yes Yes (bugged1) Yes (bugged) No No
GitLab Yes No No No No
VSCode (w/o plugins) No Yes Yes No No
Typora No Yes (disabled by default) Yes No No
StackEdit No Yes Yes No No
MarkText No Yes Yes No No

Note 1: GitHub markdown preview is also bugged, which does not treat contents inside $ $$ as a block. Stuff like $\{a\}$ also won't work.

$ $$ and other stuff mess Markdown a lot at first, but by inspecting the column, dollars are de facto delimiters for math blocks now in popular Markdown editors. If Hugo supports dollar signs (or even make the delimiter configurable), it will increase much in interopability.

@pilgrimlyieu
Copy link

pilgrimlyieu commented Jul 17, 2023

I think it'll be better if there's an way to configure the delimiters since \(...\) and \[...\] seems better than $...$ and $$...$$.

Reference

@bep bep modified the milestones: v0.116.0, v0.117.0 Aug 1, 2023
@bep bep modified the milestones: v0.117.0, v0.118.0 Aug 30, 2023
@aurelienpierre
Copy link

I don't understand this issue. I have LaTeX working properly in Hugo by loading Matjax JS.

In a script.html partial, going at the end of the page, I have : 

{{ if .Params.latex}}
  <script>
    MathJax = {
        packages: {'[+]': ['autoload', 'require']},
        tex: {
          tags: 'all',
          inlineMath: [ ['$','$'] ],
          displayMath: [ ['$$','$$'] ],
          processEscapes: true,
          processEnvironments: true,
          processRefs: true,
        },
        svg: {
          mtextInheritFont: true,
          merrorInheritFont: true,
          mathmlSpacing: false,
          skipAttributes: {},
          exFactor: .5,
          displayAlign: 'center',
          displayIndent: '0',
          fontCache: 'global',
          localID: null,
          internalSpeechTitles: true,
          titleID: 0
        },
        options: {
          ignoreHtmlClass: 'no_math',//  class that marks tags not to search
          processHtmlClass: 'math',  //  class that marks tags that should be searched
        }
    };
  </script>
  <script type="text/javascript" id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js"></script>
{{ end }}

Note that the config script lets you decide what delimiters you want. Then, on the .md files, I write typical LaTeX equations, using the dollar syntax. In the frontmatter of pages using maths, I set latex: true to avoid loading the heavy MathJax lib everywhere if it's not needed. After that, equations are rendered client-side with no issue.

@jmooring
Copy link
Member

@aurelienpierre

This will fail:

$$
\begin{array} {lcl}
  L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2
  \rightarrow x_1
  \rightarrow x_0)G(x_1
  \longleftrightarrow x_2)f_r(x_3
  \rightarrow x_2
  \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2}
  \\\\\\ &=&
  \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1}
  \rightarrow x_i
  \rightarrow x_{i-1})G(x_i
  \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k
  \longleftrightarrow x_{k-1})L_e(x_k
  \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)})
\end{array}
$$

Ampersands are converted &amp; and paired underscores create em elements. And that is the correct rendering behavior when converting markdown to HTML.

@aurelienpierre
Copy link

Right, now that you mention it, I remember that I had to extend PHPMarkdown to discard LaTeX in order to handle those corner cases for WordPress.

@bep bep modified the milestones: v0.118.0, v0.119.0 Sep 15, 2023
@bep bep removed this from the v0.119.0 milestone Oct 5, 2023
@jmooring
Copy link
Member

@erincatto You cannot enable arbitrary Goldmark extensions; only those that have been integrated.

@bep
Copy link
Member

bep commented Jan 10, 2024

This:

$$
\begin{array} {lcl}
  L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2
  \rightarrow x_1
  \rightarrow x_0)G(x_1
  \longleftrightarrow x_2)f_r(x_3
  \rightarrow x_2
  \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2}
  \\\\\\ &=&
  \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1}
  \rightarrow x_i
  \rightarrow x_{i-1})G(x_i
  \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k
  \longleftrightarrow x_{k-1})L_e(x_k
  \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)})
\end{array}
$$

Can be rewritten to:

{{< raw >}}
\begin{array} {lcl}
  L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2
  \rightarrow x_1
  \rightarrow x_0)G(x_1
  \longleftrightarrow x_2)f_r(x_3
  \rightarrow x_2
  \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2}
  \\\\\\ &=&
  \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1}
  \rightarrow x_i
  \rightarrow x_{i-1})G(x_i
  \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k
  \longleftrightarrow x_{k-1})L_e(x_k
  \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)})
\end{array}
{{< /raw >}}

Assuming there is a raw shortcode that just prints .Inner.

Am I missing something?

@jmooring
Copy link
Member

Yes, you can you use shortcodes and render hooks to pass .Inner through as-is, but that approach is not portable (import or export), standard, or easy to use. This enhancement is a good idea, enabling a capability that has been requested by many users for many years.

https://discourse.gohugo.io/tag/typesetting
https://github.com/gohugoio/hugo/issues?q=label%3A%22Feature%3A+Typesetting%22

I don't see any downside.

@j2kun
Copy link
Contributor

j2kun commented Jan 10, 2024

@bep the objection is that math prose often uses dozens of inline math blocks in a paragraph of text, many of whose .inner is just a few characters, and in this case such long short codes are tedious to type and read. It also adds a compatibility obstacle for importing and exporting to other systems that understand TeX-style math mode fences. In other words, the objection is to the verbosity of the short code fence for math-heavy text (tedious) and its difference from standard math fences (compatibility).

@bep
Copy link
Member

bep commented Jan 10, 2024

Yes, you can you use shortcodes and render hooks to pass .Inner through as-is, but that approach is not portable (import or export), standard, or easy to use.

How portable/standard is this?

$$
\begin{array} {lcl}
  L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2
  \rightarrow x_1
  \rightarrow x_0)G(x_1
  \longleftrightarrow x_2)f_r(x_3
  \rightarrow x_2
  \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2}
  \\\\\\ &=&
  \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1}
  \rightarrow x_i
  \rightarrow x_{i-1})G(x_i
  \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k
  \longleftrightarrow x_{k-1})L_e(x_k
  \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)})
\end{array}
$$

@j2kun
Copy link
Contributor

j2kun commented Jan 10, 2024

Yes, you can you use shortcodes and render hooks to pass .Inner through as-is, but that approach is not portable (import or export), standard, or easy to use.

How portable/standard is this?

$$
\begin{array} {lcl}
  L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2
  \rightarrow x_1
  \rightarrow x_0)G(x_1
  \longleftrightarrow x_2)f_r(x_3
  \rightarrow x_2
  \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2}
  \\\\\\ &=&
  \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1}
  \rightarrow x_i
  \rightarrow x_{i-1})G(x_i
  \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k
  \longleftrightarrow x_{k-1})L_e(x_k
  \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)})
\end{array}
$$

It may look silly, but this sub-language of LaTeX has been standard math typesetting for at least 30 years. It predates the web.

@jmooring
Copy link
Member

jmooring commented Jan 10, 2024

GitHub markdown using $...$ and $$...$$ delimiters:
https://gist.github.com/jmooring/f649aae89a2047e44541de2e3001fb0b

GitLab markdown using $...$ and $$...$$ delimiters:
https://gitlab.com/-/snippets/3637801

Visual Studio Code markdown preview

image

The long multiline example above is something I created as a worst case example,similar to the old browser "acid" test. It's portable to some systems but not to others.

When authoring in things like Obsidian or Typora, the $...$ and $$...$$ is standard.

@erincatto
Copy link

$$ is very portable. I'm able to paste it directly into a LaTeX document. (there are some bugs in that particular sequence though).

I think shortcodes are fine for separate, displayed blocks of math. They are quite bad for inline math. It is common in mathematical writing to refer to many single character math symbols within a single sentence. Adding lots of inline shortcodes hurts readability for the author.

@jmooring
Copy link
Member

jmooring commented Jan 10, 2024

Also, a shortcode that just prints .Inner pokes a hole in our content security model:

This:

{{< math >}}<script>alert('pwned!')</script>{{< /math >}}

is rendered to this:

<script>alert('pwned!')</script>

With this proposal, the raw markdown between and including the delimiters is not cast to template.HTML, so Go's html/template package does its job. This:

$<script>alert('pwned');</script>$

is rendered to this:

<p>$&lt;script&gt;alert('pwned');&lt;/script&gt;$</p>

When using a shortcode for this, site and theme authors need to remember to do this for their typesetting/LaTeX/math shortcode and render hook.

{{ .Inner | htmlEscape }}

@bep
Copy link
Member

bep commented Jan 10, 2024

OK, I didn't know that GitHub/lab actually supported this. I have not read the entire discussion, but I assume that one such extension does not already exist for Hugo, and that:

  • We (Hugo) want to maintain some kind of control over this (fixing bugs without too much ceremony, maybe adding some hooks?)
  • Other (Goldmark) projects would also want to use this

I suggest we

  1. Create a repo named hugo-goldmark-extensions
  2. Add a sub module hugo-goldmark-extensions/insert-name-here

If someone can help me with a name for this particular extension, I can create it and we can talk about who want to implement it.

But the scope is _blocks_ ala the GitHub example, right?

@jmooring
Copy link
Member

jmooring commented Jan 10, 2024

1) There are no existing extensions that do what we need. Those that do exist exceed the scope of this proposal, are opinionated, and in some cases impede performance.

2) Yes, I think the Hugo project should maintain control over this.

3) Other projects that rely Goldmark may benefit from this. Its un-opinionated and generic implementation can be used with any JS package or renderer that needs to parse raw content... math, chemistry, physics, diagrams, etc.

4) The names of the core Goldmark extensions (maintained by yuin) are singular nouns. The best name that I have come up with so far is rawBlock, but others may have better ideas.

But the scope is blocks ala the GitHub example, right?

I'm not sure exactly what you mean, but the "blocks" that will bypass markdown processing may be inline, block, single line, or multiline. An important note is that the delimiters themselves are not swallowed; they are part of the "block".

Finally, the proposed default configuration for this is:

markup:
  goldmark:
    extensions:
      rawBlock:     # or a better name
        enable: false
        delimiters:
          - ['$','$']    # inline equations
          - ['$$','$$']  # block equations
          - ['\(','\)']  # inline equations
          - ['\[','\]']  # block equations

@bep
Copy link
Member

bep commented Jan 10, 2024

I'm not sure exactly what you mean,

It's the title of this issue, "raw content blocks". Is that the scope of this discussion?

$$
This is a block.
$$

This is $inline$.

@j2kun
Copy link
Contributor

j2kun commented Jan 10, 2024

IMO inline would need to be in scope as well as blocks. Maybe rawContent would be sufficiently expressive?

@jmooring
Copy link
Member

jmooring commented Jan 10, 2024

Is that the scope of this discussion?

Yes, as well as the other common/standard1 delimiting pairs as shown in the proposed default configuration above.

\[
This is a block.
\]

\[This is a block.\]

This is \(inline\).

Note that block equations may have the delimiters on the same line, or on the preceding and following lines. I have not seen the two mixed. For example, I don't think we need to worry about these:

$$This is a block
$$

This is $inline
$

Footnotes

  1. The bracket/parentheses delimiters are less common, but both are included in the KaTeX and MathJax documentation. While the body's open...

@bep
Copy link
Member

bep commented Jan 10, 2024

OK, but then the name rawBlock is maybe not great.

What about hugo-goldmark-extensions/passthrough?

markup:
  goldmark:
    extensions:
      rawBlock:     # or a better name
        enable: false
        delimiters:
          - ['$','$']    # inline equations
          - ['$$','$$']  # block equations
          - ['\(','\)']  # inline equations
          - ['\[','\]']  # block equations

I suspect that the implementation need to distinguish between block and inline delimiters, but time will tell.

@jmooring
Copy link
Member

hugo-goldmark-extensions/passthrough

Perfect.

@bep
Copy link
Member

bep commented Jan 10, 2024

OK, I have created https://github.com/gohugoio/hugo-goldmark-extensions -- I suggest we take implementation specific discussions somewhere inside that repo.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants