Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[remark-parse] Ordered lists are not recognized if they both use leading zeroes and interrupt a block #1242

Closed
4 tasks done
benblank opened this issue Oct 13, 2023 · 11 comments
Labels
🙅 no/wontfix This is not (enough of) an issue for this project 👎 phase/no Post cannot or will not be acted on

Comments

@benblank
Copy link

Initial checklist

Affected packages and versions

remark-parse@11.0.0

Link to runnable example

No response

Steps to reproduce

  1. In a new folder, create a new Node module by running e.g. pnpm init.

  2. Run pnpm install remark-parse@11.0.0.

  3. Run pnpm install unified@11.0.3.

  4. Save the code below as repro.mjs file and run node repro.mjs. (I used Node v18.17.0.)

    This will generate a JSON file containing the parsed AST (sans position properties, so that they can be easily diffed) for each of the Markdown snippets it contains.

    repro.mjs
    import { writeFile } from "node:fs/promises";
    import remarkParse from "remark-parse";
    import { unified } from "unified";
    
    const parser = unified().use(remarkParse).freeze();
    
    const documents = {
      noLeadingZeroesFollowing: `The preceeding paragraph.
    
    1. one
    4. two
    `,
    
      leadingZeroesFollowing: `The preceeding paragraph.
    
    01. one
    02. two
    `,
    
      noLeadingZeroesInterrupting: `The preceeding paragraph.
    1. one
    2. two
    `,
    
      leadingZeroesInterrupting: `The preceeding paragraph.
    01. one
    02. two
    `,
    };
    
    function stripPositions(node) {
      const { position, children, ...rest } = node;
    
      return { ...rest, children: children?.map(stripPositions) };
    }
    
    await Promise.all(
      Object.entries(documents).map(([name, text]) =>
        writeFile(
          name + ".json",
          JSON.stringify(stripPositions(parser.parse(text)), undefined, 2),
        ),
      ),
    );
  5. Observe that the files noLeadingZeroesFollowing.json, leadingZeroesFollowing.json, and noLeadingZeroesInterrupting.json are identical and that their root nodes contain both a paragraph node and a list node. However, the root node in leadingZeroesInterrupting.json instead contains only a single paragraph node. Diffing it against any of the other files will produce output similar to the following.

    repro.diff
    --- noLeadingZeroesFollowing.json	2023-10-13 16:11:26.261286672 -0700
    +++ leadingZeroesInterrupting.json	2023-10-13 16:11:26.261286672 -0700
    @@ -6,47 +6,7 @@
          "children": [
            {
              "type": "text",
    -          "value": "The preceeding paragraph."
    -        }
    -      ]
    -    },
    -    {
    -      "type": "list",
    -      "ordered": true,
    -      "start": 1,
    -      "spread": false,
    -      "children": [
    -        {
    -          "type": "listItem",
    -          "spread": false,
    -          "checked": null,
    -          "children": [
    -            {
    -              "type": "paragraph",
    -              "children": [
    -                {
    -                  "type": "text",
    -                  "value": "one"
    -                }
    -              ]
    -            }
    -          ]
    -        },
    -        {
    -          "type": "listItem",
    -          "spread": false,
    -          "checked": null,
    -          "children": [
    -            {
    -              "type": "paragraph",
    -              "children": [
    -                {
    -                  "type": "text",
    -                  "value": "two"
    -                }
    -              ]
    -            }
    -          ]
    +          "value": "The preceeding paragraph.\n01. one\n02. two"
            }
          ]
        }

Expected behavior

Ordered lists should be parsed consistently, regardless of whether their list markers have leading zeroes or the list interrupts a block.

Actual behavior

Ordered lists are recognized as such if their list markers have leading zeroes or they interrupt a block. However, ordered lists are not recognized as such if their list markers have leading zeroes and they interrupt a block.

Runtime

Other (please specify in steps to reproduce)

Package manager

pnpm

OS

Linux

Build and bundle tools

Other (please specify in steps to reproduce)

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Oct 13, 2023
@benblank
Copy link
Author

Apologies for not providing a runnable example, but I spent more time trying (and failing) to get codesandbox to do something useful than I did on the rest of the report. 😅

@ChristianMurphy
Copy link
Member

Thanks @benblank!
Here is the repro in a sandbox https://stackblitz.com/edit/node-mneiet?file=index.js
I'm seeing the same behavior you describe when running remark 15.0.1

Checking the four examples in CommonMark Dingus

  1. https://spec.commonmark.org/dingus/?text=The%20preceeding%20paragraph.%0A%0A1.%20one%0A4.%20two
  2. https://spec.commonmark.org/dingus/?text=The%20preceeding%20paragraph.%0A%0A01.%20one%0A02.%20two
  3. https://spec.commonmark.org/dingus/?text=The%20preceeding%20paragraph.%0A1.%20one%0A2.%20two
  4. https://spec.commonmark.org/dingus/?text=The%20preceeding%20paragraph.%0A01.%20one%0A02.%20two

It does indeed appear all four should produce a list


Tracing further.
I suspect the issue is down one level in micromark, I'm able to replicate the issue without having the AST generated https://stackblitz.com/edit/node-1ygk3h?file=index.js

@ChristianMurphy ChristianMurphy added 🐛 type/bug This is a problem 🌊 blocked/upstream This cannot progress before something external happens first 👍 phase/yes Post is accepted and can be worked on and removed 🤞 phase/open Post is being triaged manually labels Oct 14, 2023
@github-actions

This comment has been minimized.

@benblank
Copy link
Author

I suspect the issue is down one level in micromark, I'm able to replicate the issue without having the AST generated

Ah! Dang. I'd traced it this far down from Prettier and thought I'd gotten to the bottom of it. 🙂

Thanks for all the helpful links!

@wooorm
Copy link
Member

wooorm commented Oct 15, 2023

I do think the spec is unclear for this:

In order to solve of unwanted lists in paragraphs with hard-wrapped numerals, we allow only lists starting with `1` to interrupt paragraphs. Thus,~

(right above example 304).
As in, I followed those words here.

I think that the current behavior is in line with the reasoning there. Natural language phrases might include 1., but 2. or 01. are more unlikely.

@wooorm
Copy link
Member

wooorm commented Oct 15, 2023

If you care strongly about this, could you perhaps open an issue with commonmark/commonmark-spec to check what the idea is?

@benblank
Copy link
Author

Actually, I missed that when I was reading through the spec. I'm not sure I 100% agree with the reasoning behind it, but those reasons do at least appear to be pretty clear.

I may indeed open up an issue with regards to the phasing, though; I feel the section you quoted would be improved by calling out that it's only referring to ordered lists and to the markers 1. and 1) (not the character 1), even if there are examples demonstrating both cases. The emphasis on the principle of uniformity also suggests that the exception applies to nested lists as well, but I don't see text or an example calling that out.

I also have to admit to being a bit surprised to see "interrupting, not starting with 1" called out as not being valid, simply because when I was checking BabelMark, a large number of the parsers (including nine of the twelve marked as specifically targeting CommonMark) considered it valid.

On the one hand, it's a shame to "disagree" with so many other implementations, but the spec is clear as to what the Right Thing is, and it isn't what I was trying to do. I'll go ahead and close the issue.

Thanks for taking the time to look into this!

@github-actions

This comment has been minimized.

@ChristianMurphy ChristianMurphy added 🙅 no/wontfix This is not (enough of) an issue for this project and removed 🐛 type/bug This is a problem 🌊 blocked/upstream This cannot progress before something external happens first 👍 phase/yes Post is accepted and can be worked on labels Oct 15, 2023
@github-actions

This comment was marked as resolved.

@github-actions

This comment has been minimized.

@github-actions github-actions bot added the 👎 phase/no Post cannot or will not be acted on label Oct 15, 2023
@wooorm
Copy link
Member

wooorm commented Oct 16, 2023

There’s a wide variety of parser that all do things differently.
CM likes to be ambiguous on all the edge cases. This also comes as a given when it’s mostly a test suite of input/output examples, and not an explanation of an algorithm (such as HTML).
I’d like a more formal spec. But I can see value in this too.
Anyway, feel free to PR to the spec another example of the 01 case. Then I (and others) will go with the one that’s decided for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🙅 no/wontfix This is not (enough of) an issue for this project 👎 phase/no Post cannot or will not be acted on
Development

No branches or pull requests

3 participants