Possibly better verse block handling? #48

ispringle · 2022-08-28T23:38:15Z

I was a bit surprised when my verse blocks were rendering as <pre>. I looked into the source and saw your annotations as to the why:

uniorg/packages/uniorg-rehype/src/org-to-hast.ts

Line 302 in c6fdf0d

case 'verse-block':

This could be /mostly/ resolved with CSS but there are going to be issues still. For example, superscript doesn't render correctly but instead results in ^superscript.

I looked into the rehype-minify that you mentioned and it's doing a fairly naive (arguably on purpose) replace and using a fairly basic definition of whitespace (https://github.com/rehypejs/rehype-minify/blob/1dc9280c341087a40dfaa332792c095f96d41686/packages/rehype-minify-whitespace/index.js#L286). In this case it's looking for literal spaces, tabs, newlines, and carriage returns. With regard to spaces, arguably the only whitespace that /really/ matters for our purposes, it's only looking for the literal space. Additionally, the definition of "whitespace" it searches for in the HAST is equally naive (again, likely purposefully so) and is only looking for the regular expression /[ \t\n\f\r]/g (https://github.com/syntax-tree/hast-util-whitespace/blob/3c765ef9b3fc561976649b97543498cfa7068760/index.js#L16) Thus, I think we can do exactly what org-publish does and wrap each verse block in it's own , replace spaces with the non-breaking space ( ), and replace newlines with  . I examined the rehype-minify plugin and it also will not remove when it comes before or after an element, ie This space at the end of this string " "is preserved because it comes before an element".

I believe this means we can perfectly replicate Org's own output for the verse block without fear of rehype-minify changing the output. Thoughts? If you agree with my understanding of these two rehype plugins, I would be willing to start working on a PR.

The text was updated successfully, but these errors were encountered:

rasendubi · 2022-08-31T20:49:46Z

Thanks for digging in! Replacing \n with   seems to be what org is doing indeed. That should be enough to preserve newlines. Preserving inter-word spacing is less of an issue; users that find it important can either disable minification or replace p.verse with pre instead (with a unified plugin)x. (org-html-export-as-html does not replace spaces with   so I would avoid doing that.)

ispringle · 2022-09-03T11:34:03Z

Perhaps the verse blocks you're looking at are not using whitespace? Given the following org file:

The following verse block has ample whitespace.
#+begin_verse
Here is a verse
          Where whitespace matters and
                                   Ought be beautiful
#+end_verse
This document was exported with the most basic settings and =org-html-export-as-html=.

I got this output (sans the inner style content as it's verbose to irrelevant):

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<!-- 2022-09-03 Sat 06:30 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>&lrm;</title>
<meta name="author" content="Ian S. Pringle" />
<meta name="generator" content="Org Mode" />
<style>
...
</style>
</head>
<body>
<div id="content" class="content">
<p>
The following verse block has ample whitespace.
</p>
<p class="verse">
Here is a verse<br />
&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;Where whitespace matters and<br />
&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;Ought be beautiful<br />
</p>
<p>
This document was exported with the most basic settings and <code>org-html-export-as-html</code>.</p>
</div>
<div id="postamble" class="status">
<p class="author">Author: Ian S. Pringle</p>
<p class="date">Created: 2022-09-03 Sat 06:30</p>
</div>
</body>
</html>

Now normally my org-export function uses the HTML encoded version of the no-break space, but here it's using the hex value. Not sure if that matters one way or the other, might be htmlize or something playing with things...

rasendubi · 2022-09-03T15:16:17Z

Oh, I see! so, the actual behavior is that verse blocks are first stripped of the common whitespace prefix, then newlines are converted to   and leading whitspace is converted to nbsp.

You can see that from this sample:

#+begin_verse
  hello
    this is a verse   triple whitespace
#+end_verse

that converts to:

<p class="verse">
hello<br />
&#xa0;&#xa0;this is a verse   triple whitespace<br />
</p>

rasendubi · 2022-09-03T15:20:29Z

From ox-html.el:

;;;; Verse Block

(defun org-html-verse-block (_verse-block contents info)
  "Transcode a VERSE-BLOCK element from Org to HTML.
CONTENTS is verse block contents.  INFO is a plist holding
contextual information."
  (format "<p class=\"verse\">\n%s</p>"
	  ;; Replace leading white spaces with non-breaking spaces.
	  (replace-regexp-in-string
	   "^[ \t]+" (lambda (m) (org-html--make-string (length m) "&#xa0;"))
	   ;; Replace each newline character with line break.  Also
	   ;; remove any trailing "br" close-tag so as to avoid
	   ;; duplicates.
	   (let* ((br (org-html-close-tag "br" nil info))
		  (re (format "\\(?:%s\\)?[ \t]*\n" (regexp-quote br))))
	     (replace-regexp-in-string re (concat br "\n") contents)))))

ispringle · 2022-09-16T11:54:16Z

This'll do the trick

      case "verse-block":
        const interleave = (a, e) => a.flatMap((x) => [x, e]).slice(0, -1);
        const verses = org.children[0].value.split("\n");
        const newChildren = interleave(
          verses
            .map((v) => {
              if (v != "") {
                const value = v.replaceAll(" ", "\u00A0");
                return { type: "text", value };
              }
              return null;
            })
            .filter((v) => v != null),
          h("", "br", {}, [])
        );
        return h(org, "p.verse", {}, toHast(newChildren));

A little naive, as it replaced all spaces with nbsp's but that's because I couldn't come up with a better solution due to replaceAll only accepting globally scoped regexp. Could declare something like const leadingSpaces = /^[ \t]+/ globally but that's up to you, I didn't see a precedent for globals so I just went with this for now.

EDIT

Needs some more work, just realized there are instances of verse blocks that have n+1 children (ie verse blocks with superscript).

ispringle · 2022-09-16T13:47:48Z

This one properly interacts with all children of a verse block:

      case "verse-block":
        const interleave = (a, e) => a.flatMap((x) => [x, e]).slice(0, -1);
        const verses = org.children.flatMap((n) =>
          n.type != "text"
            ? n
            : interleave(
                n.value
                  .split("\n")
                  .map((v) => {
                    if (v != "") {
                      const value = v.replaceAll(" ", "\u00A0");
                      return { type: "text", value };
                    }
                    return null;
                  })
                  .filter((v) => v != null),
                h("", "br", {}, [])
              )
        );
        return h(org, "p.verse", {}, toHast(verses));

rasendubi · 2023-01-29T21:56:29Z

Just looked into this again and it seems to be even more complicated than that. @ispringle your last solution fails on nested children with newlines:

#+begin_verse
  some text *and
    bold* overflowing on the next line
#+end_verse

this should produce 2 nbsp's before "bold":

<p class="verse">
some text <b>and<br />
&#xa0;&#xa0;bold</b> overflowing on the next line<br />
</p>

In uniorg, that example currently parses as:

  - type: "verse-block"
    children:
      - type: "text"
        value: "  some text "
      - type: "bold"
        children:
          - type: "text"
            value: "and\\n    bold"
      - type: "text"
        value: " overflowing on the next line\\n"

So the processing probably needs to happen in two passes:

Traverse all children deeply calculating the common whitespace prefix (2 in the example above). (This is complicated by the fact that first children does not have leading newline, and the last newline does not have whitespace after it)
Traverse all children deeply, and:
- strip common prefix (in the first child and after every newline)
- replace whitespace prefix with nbsp
- replace newlines with

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly better verse block handling? #48

Possibly better verse block handling? #48

ispringle commented Aug 28, 2022 •

edited

rasendubi commented Aug 31, 2022

ispringle commented Sep 3, 2022

rasendubi commented Sep 3, 2022

rasendubi commented Sep 3, 2022

ispringle commented Sep 16, 2022 •

edited

ispringle commented Sep 16, 2022 •

edited

rasendubi commented Jan 29, 2023 •

edited

Possibly better verse block handling? #48

Possibly better verse block handling? #48

Comments

ispringle commented Aug 28, 2022 • edited

rasendubi commented Aug 31, 2022

ispringle commented Sep 3, 2022

rasendubi commented Sep 3, 2022

rasendubi commented Sep 3, 2022

ispringle commented Sep 16, 2022 • edited

ispringle commented Sep 16, 2022 • edited

rasendubi commented Jan 29, 2023 • edited

ispringle commented Aug 28, 2022 •

edited

ispringle commented Sep 16, 2022 •

edited

ispringle commented Sep 16, 2022 •

edited

rasendubi commented Jan 29, 2023 •

edited