[remark-parse] Ordered lists are not recognized if they both use leading zeroes and interrupt a block #1242

benblank · 2023-10-13T23:37:11Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

remark-parse@11.0.0

Link to runnable example

No response

Steps to reproduce

In a new folder, create a new Node module by running e.g. pnpm init.
Run pnpm install remark-parse@11.0.0.
Run pnpm install unified@11.0.3.

Save the code below as repro.mjs file and run node repro.mjs. (I used Node v18.17.0.)

This will generate a JSON file containing the parsed AST (sans position properties, so that they can be easily diffed) for each of the Markdown snippets it contains.

repro.mjs

import { writeFile } from "node:fs/promises";
import remarkParse from "remark-parse";
import { unified } from "unified";

const parser = unified().use(remarkParse).freeze();

const documents = {
  noLeadingZeroesFollowing: `The preceeding paragraph.

1. one
4. two
`,

  leadingZeroesFollowing: `The preceeding paragraph.

01. one
02. two
`,

  noLeadingZeroesInterrupting: `The preceeding paragraph.
1. one
2. two
`,

  leadingZeroesInterrupting: `The preceeding paragraph.
01. one
02. two
`,
};

function stripPositions(node) {
  const { position, children, ...rest } = node;

  return { ...rest, children: children?.map(stripPositions) };
}

await Promise.all(
  Object.entries(documents).map(([name, text]) =>
    writeFile(
      name + ".json",
      JSON.stringify(stripPositions(parser.parse(text)), undefined, 2),
    ),
  ),
);

Observe that the files noLeadingZeroesFollowing.json, leadingZeroesFollowing.json, and noLeadingZeroesInterrupting.json are identical and that their root nodes contain both a paragraph node and a list node. However, the root node in leadingZeroesInterrupting.json instead contains only a single paragraph node. Diffing it against any of the other files will produce output similar to the following.

repro.diff

--- noLeadingZeroesFollowing.json	2023-10-13 16:11:26.261286672 -0700
+++ leadingZeroesInterrupting.json	2023-10-13 16:11:26.261286672 -0700
@@ -6,47 +6,7 @@
      "children": [
        {
          "type": "text",
-          "value": "The preceeding paragraph."
-        }
-      ]
-    },
-    {
-      "type": "list",
-      "ordered": true,
-      "start": 1,
-      "spread": false,
-      "children": [
-        {
-          "type": "listItem",
-          "spread": false,
-          "checked": null,
-          "children": [
-            {
-              "type": "paragraph",
-              "children": [
-                {
-                  "type": "text",
-                  "value": "one"
-                }
-              ]
-            }
-          ]
-        },
-        {
-          "type": "listItem",
-          "spread": false,
-          "checked": null,
-          "children": [
-            {
-              "type": "paragraph",
-              "children": [
-                {
-                  "type": "text",
-                  "value": "two"
-                }
-              ]
-            }
-          ]
+          "value": "The preceeding paragraph.\n01. one\n02. two"
        }
      ]
    }

Expected behavior

Ordered lists should be parsed consistently, regardless of whether their list markers have leading zeroes or the list interrupts a block.

Actual behavior

Ordered lists are recognized as such if their list markers have leading zeroes or they interrupt a block. However, ordered lists are not recognized as such if their list markers have leading zeroes and they interrupt a block.

Runtime

Other (please specify in steps to reproduce)

Package manager

pnpm

OS

Linux

Build and bundle tools

Other (please specify in steps to reproduce)

The text was updated successfully, but these errors were encountered:

benblank · 2023-10-13T23:38:38Z

Apologies for not providing a runnable example, but I spent more time trying (and failing) to get codesandbox to do something useful than I did on the rest of the report. 😅

ChristianMurphy · 2023-10-14T02:38:28Z

Thanks @benblank!
Here is the repro in a sandbox https://stackblitz.com/edit/node-mneiet?file=index.js
I'm seeing the same behavior you describe when running remark 15.0.1

Checking the four examples in CommonMark Dingus

It does indeed appear all four should produce a list

Tracing further.
I suspect the issue is down one level in micromark, I'm able to replicate the issue without having the AST generated https://stackblitz.com/edit/node-1ygk3h?file=index.js

benblank · 2023-10-14T04:16:04Z

I suspect the issue is down one level in micromark, I'm able to replicate the issue without having the AST generated

Ah! Dang. I'd traced it this far down from Prettier and thought I'd gotten to the bottom of it. 🙂

Thanks for all the helpful links!

wooorm · 2023-10-15T16:51:26Z

I do think the spec is unclear for this:

In order to solve of unwanted lists in paragraphs with hard-wrapped numerals, we allow only lists starting with `1` to interrupt paragraphs. Thus,~

(right above example 304).
As in, I followed those words here.

I think that the current behavior is in line with the reasoning there. Natural language phrases might include 1., but 2. or 01. are more unlikely.

wooorm · 2023-10-15T17:03:54Z

If you care strongly about this, could you perhaps open an issue with commonmark/commonmark-spec to check what the idea is?

benblank · 2023-10-15T22:50:31Z

Actually, I missed that when I was reading through the spec. I'm not sure I 100% agree with the reasoning behind it, but those reasons do at least appear to be pretty clear.

I may indeed open up an issue with regards to the phasing, though; I feel the section you quoted would be improved by calling out that it's only referring to ordered lists and to the markers 1. and 1) (not the character 1), even if there are examples demonstrating both cases. The emphasis on the principle of uniformity also suggests that the exception applies to nested lists as well, but I don't see text or an example calling that out.

I also have to admit to being a bit surprised to see "interrupting, not starting with 1" called out as not being valid, simply because when I was checking BabelMark, a large number of the parsers (including nine of the twelve marked as specifically targeting CommonMark) considered it valid.

On the one hand, it's a shame to "disagree" with so many other implementations, but the spec is clear as to what the Right Thing is, and it isn't what I was trying to do. I'll go ahead and close the issue.

Thanks for taking the time to look into this!

wooorm · 2023-10-16T16:22:20Z

There’s a wide variety of parser that all do things differently.
CM likes to be ambiguous on all the edge cases. This also comes as a given when it’s mostly a test suite of input/output examples, and not an explanation of an algorithm (such as HTML).
I’d like a more formal spec. But I can see value in this too.
Anyway, feel free to PR to the spec another example of the 01 case. Then I (and others) will go with the one that’s decided for that!

github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Oct 13, 2023

ChristianMurphy added 🐛 type/bug This is a problem 🌊 blocked/upstream This cannot progress before something external happens first 👍 phase/yes Post is accepted and can be worked on and removed 🤞 phase/open Post is being triaged manually labels Oct 14, 2023

This comment has been minimized.

Sign in to view

benblank closed this as completed Oct 15, 2023

This comment has been minimized.

Sign in to view

ChristianMurphy added 🙅 no/wontfix This is not (enough of) an issue for this project and removed 🐛 type/bug This is a problem 🌊 blocked/upstream This cannot progress before something external happens first 👍 phase/yes Post is accepted and can be worked on labels Oct 15, 2023

This comment was marked as resolved.

Sign in to view

This comment has been minimized.

Sign in to view

github-actions bot added the 👎 phase/no Post cannot or will not be acted on label Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[remark-parse] Ordered lists are not recognized if they both use leading zeroes and interrupt a block #1242

[remark-parse] Ordered lists are not recognized if they both use leading zeroes and interrupt a block #1242

benblank commented Oct 13, 2023

benblank commented Oct 13, 2023

ChristianMurphy commented Oct 14, 2023

This comment has been minimized.

benblank commented Oct 14, 2023

wooorm commented Oct 15, 2023

wooorm commented Oct 15, 2023

benblank commented Oct 15, 2023

This comment has been minimized.

This comment was marked as resolved.

This comment has been minimized.

wooorm commented Oct 16, 2023

[remark-parse] Ordered lists are not recognized if they both use leading zeroes and interrupt a block #1242

[remark-parse] Ordered lists are not recognized if they both use leading zeroes and interrupt a block #1242

Comments

benblank commented Oct 13, 2023

Initial checklist

Affected packages and versions

Link to runnable example

Steps to reproduce

Expected behavior

Actual behavior

Runtime

Package manager

OS

Build and bundle tools

benblank commented Oct 13, 2023

ChristianMurphy commented Oct 14, 2023

This comment has been minimized.

benblank commented Oct 14, 2023

wooorm commented Oct 15, 2023

wooorm commented Oct 15, 2023

benblank commented Oct 15, 2023

This comment has been minimized.

This comment was marked as resolved.

This comment has been minimized.

wooorm commented Oct 16, 2023