fix: Full Commonmark compliance for Lists #2112

calculuschild · 2021-06-18T18:32:37Z

Marked version: 2.1.1

Markdown flavor: CommonMark

Description

A reworking of the Block Lists tokenizer. Also adjusting some of the New unit tests because they are not accurate to the actual Commonmark spec and dingus results.

Fixes Commonmark Examples 232, 234, 243, 244, 248, 250, 254, 276, 277, 287, 288, 289. This now passes all Lists and all List Items Commonmark tests!
Fixes some gitihub issues probably. Need to dig through and see.

This seems to be about the same speed as Master or slightly slower, but it's always hard to tell.

Side note, should we move the New tests that feature the pedantic option into the Original folder to be with all the other pedantic tests?

Contributor

Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
no tests required for this PR.
If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

CI is green (no forced merge required).
Squash and Merge PR following conventional commit guidelines.

vercel · 2021-06-18T18:32:41Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/markedjs/markedjs/4HWneXd69SCqp9wDXN2PgRHkLKHJ
✅ Preview: https://markedjs-git-fork-calculuschild-cleanuplists-markedjs.vercel.app

UziTech · 2021-06-20T21:37:22Z

I believe original are actual specs for the original markdown.pl and new are basically any tests that are not actually part of any spec but should (for the most part) continue passing.

calculuschild · 2021-06-21T00:40:34Z

Ok that makes sense. I could see that the New specs for Pedantic do seem to match the Daring Fireball dingus so that's fine.

I could use some help figuring out the rule for New: list_align_pedantic though. There doesn't seem to be any logical pattern here:

- one
 - two
  - three
    - four
     - five
      - six
       - seven

becomes

<ul>
	<li>one
		<ul>
			<li>two</li>
			<li>three</li>
			<li>four
				<ul>
					<li>five</li>
					<li>six</li>
					<li>seven</li>
				</ul>
			</li>
		</ul>
	</li>
</ul>

Same with New : list_item_text :

  * item1

    * item2

  text

becomes

<ul><li><p>item1</p>  <ul><li>item2 </li></ul> <p>text</p> </li></ul>

calculuschild · 2021-06-21T00:43:49Z

For New : main, the output HTML does not match the commonmark dingus. How do we want to correct this test? Change the input Markdown so it matches the test HTML, or change the expected HTML so it matches the test Markdown?

Or... after making either of those changes, this test might be redundant since I think it overlaps with existing tests everywhere...

UziTech · 2021-06-21T05:03:52Z

New: list_align_pedantic fixes #1923

New: list_item_text fixes #1947

New: main should probably be marked as pedantic?

calculuschild · 2021-06-21T05:41:24Z

Ok, thanks for tracking that down. I have the logic from those issues working, but I still am having trouble with the alignment/indentation rules for those two test cases. Can you help me understand when a pedantic list considers something a sublist versus not? The daring fireball spec does not say anything about when to nest sublists that I can see.

`list_align_pedantic`

Is it something like... Every bullet with an indent between 1 and 4 spaces is nested at the same "level", and then 5-8 spaces is the next level?

Because the dingus gives REALLY strange results with stuff like this:

- one
     - teo
 - three

Becomes

<ul>
<li>one
<ul><li>teo
<ul><li>three</li></ul></li></ul></li>
</ul>

`list_item_text`

I'm not sure why "text" is part of the outer list because it is not indented enough relative to the top bullet point. The spec says

List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by... 4 spaces

But maybe it's actually anywhere between 1-4 spaces and the indentation first bullet doesn't matter?

UziTech · 2021-06-21T06:09:31Z

the dingus gives REALLY strange results with stuff

The original spec was very lax on details because it assumed well formatted markdown, so bad markdown should give unspecified results (garbage in, garbage out). Unfortunately people who write bad markdown still want consistency.

The easiest way to get around it is to have if (options.pedantic) blocks that correct for the daringfireball dingus results.

UziTech · 2021-06-21T06:12:38Z

Every bullet with an indent between 1 and 4 spaces is nested at the same "level", and then 5-8 spaces is the next level

That seems to be the case.

But maybe it's actually anywhere between 1-4 spaces and the indentation first bullet doesn't matter?

yes

calculuschild · 2021-06-21T06:14:50Z

Right right. So I think you saying we should match the dingus as close as possible even if it's garbage, right?

My question then was an ask for help in reverse engineering the dingus logic so I can get the same garbage in garbage out for consistency, as you say. Then I can work out where and what to put into the options.pedsntic sections.

calculuschild · 2021-06-21T20:56:37Z

New: main should probably be marked as pedantic?

@UziTech Unfortunately it doesn't follow Pedantic rules either. The problem area is this:

* List Item 2
  * New List Item 1
    Hi, this is a list item.
  * New List Item 2
    Another item
        Code goes here.
        Lots of it...      
  * New List Item 3
    The last item

The test wants it to become

<ul>
<li>New List Item 1 Hi, this is a list item.</li>
<li>New List Item 2 Another item 
<pre><code>Code goes here.
Lots of it...</code></pre>
</li>
<li>New List Item 3 The last item</li>
</ul>

However, an indented code block cannot interrupt a paragraph without a blank line before, both in CommonMark and in Pedantic.

So both Commonmark and Pedantic dingus give this:

<ul>
<li>New List Item 1 Hi, this is a list item.</li>
<li>New List Item 2 Another item Code goes here. Lots of it...</li>
<li>New List Item 3 The last item</li>
</ul>

My vote is to just remove this test altogether since it is an unwieldy, large, test that don't seem to cover anything that isn't already covered by smaller tests.

UziTech · 2021-06-21T21:01:28Z

Sounds good it looks like it was created well before my time so most likely it is out of date.

I would be better to keep the tests specific anyway.

calculuschild · 2021-06-21T21:08:33Z

Tadaa! Passing all spec tests now, plus 3 more Commonmark examples.

Unfortunately there are a bunch of Unit tests that are expecting tokens to look different now so that needs to be cleaned up. Essentially lists no longer consume blank newlines at the end, so you end up with space tokens instead (makes checking whether a list is loose or not more accurate as well). But the resulting HTML is the same.

calculuschild · 2021-06-21T21:25:18Z

And now unit tests are all passing. I'm going to see if I can knock out the last commonmark specs, but I think this is in a good spot to review.

I also have a couple ideas to speed this up slightly but I want to see if I can get the last examples working first.

styfle · 2021-07-05T14:49:52Z

src/Lexer.js

+          lastToken.raw += '\n' + token.raw;
+          lastToken.text += '\n' + token.raw;
+        } else {
+          if (!this.tokens.links[token.tag]) {


Nit pick: this could be changed to else if instead of nested else and if

styfle

Great work, thanks!

calculuschild · 2021-08-02T04:06:36Z

Reminder to not merge this quite yet. Now that #2124 is approved I want to look over this again because it needs to be converted over to Lex its own child tokens in that new format.

UziTech · 2021-08-02T19:14:18Z

I merged #2124 so you should be able to rebase this and make the changes for lexing the child tokens in the tokenizer.

…into cleanUpLists

calculuschild · 2021-08-06T04:31:57Z

@UziTech .....sigh.... It is done....

I don't know why but that was a nightmare.

src/Tokenizer.js

# [3.0.0](v2.1.3...v3.0.0) (2021-08-16) ### Bug Fixes * Add module field to package.json ([#2143](#2143)) ([edc2e6d](edc2e6d)) * drop node 10 support ([#2157](#2157)) ([433b16f](433b16f)) * Full Commonmark compliance for Lists ([#2112](#2112)) ([eb33d3b](eb33d3b)) * Refactor table tokens ([#2166](#2166)) ([bc400ac](bc400ac)) ### BREAKING CHANGES * - `table` tokens `header` property changed to contain an array of objects for each header cell with `text` and `tokens` properties. - `table` tokens `cells` property changed to `rows` and is an array of rows where each row contains an array of objects for each cell with `text` and `tokens` properties. v2: ```json { "type": "table", "align": [null, null], "raw": "| a | b |\n|---|---|\n| 1 | 2 |\n", "header": ["a", "b"], "cells": [["1", "2"]], "tokens": { "header": [ [{ "type": "text", "raw": "a", "text": "a" }], [{ "type": "text", "raw": "b", "text": "b" }] ], "cells": [[ [{ "type": "text", "raw": "1", "text": "1" }], [{ "type": "text", "raw": "2", "text": "2" }] ]] } } ``` v3: ```json { "type": "table", "align": [null, null], "raw": "| a | b |\n|---|---|\n| 1 | 2 |\n", "header": [ { "text": "a", "tokens": [{ "type": "text", "raw": "a", "text": "a" }] }, { "text": "b", "tokens": [{ "type": "text", "raw": "b", "text": "b" }] } ], "rows": [ { "text": "1", "tokens": [{ "type": "text", "raw": "1", "text": "1" }] }, { "text": "2", "tokens": [{ "type": "text", "raw": "2", "text": "2" }] } ] } ``` * Add module field to package.json * drop node 10 support

github-actions · 2021-08-16T03:11:02Z

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

calculuschild added 4 commits June 18, 2021 00:58

Rewrite List Tokenizer, Fix incorrect "new" spec tests

4482e0c

cleanup

ec216ae

more cleanup

5cfa5de

Merge remote-tracking branch 'upstream/master' into cleanUpLists

1697a2b

vercel bot deployed to Preview June 18, 2021 18:32 View deployment

Passing all Spec tests

26a56fa

vercel bot deployed to Preview June 21, 2021 21:07 View deployment

Fix some unit tests (lists no longer consume blank lines at end of list)

65907ee

vercel bot deployed to Preview June 21, 2021 21:14 View deployment

Fix more "lists consuming blank lines" unit tests.

3dd2aff

vercel bot deployed to Preview June 21, 2021 21:17 View deployment

All unit tests passing!

4fe6e7c

vercel bot deployed to Preview June 21, 2021 21:21 View deployment

Lint

41f9bde

vercel bot deployed to Preview June 21, 2021 21:27 View deployment

Two more commonmark examples fixed

bb8fe00

vercel bot deployed to Preview June 21, 2021 21:44 View deployment

calculuschild mentioned this pull request Jul 3, 2021

Remove nptable tokenizer #2126

Closed

5 tasks

UziTech added this to In Progress in vNext via automation Jul 3, 2021

UziTech approved these changes Jul 3, 2021

View reviewed changes

styfle reviewed Jul 5, 2021

View reviewed changes

styfle approved these changes Jul 5, 2021

View reviewed changes

calculuschild mentioned this pull request Jul 28, 2021

Italic/bold is treated as a list, When Italic/bold is written on the next line of the list #1980

Closed

UziTech mentioned this pull request Aug 2, 2021

drop node 10 support #2157

Merged

5 tasks

calculuschild added 3 commits August 6, 2021 00:22

Rebase onto markedjs#2124

7140744

Merge branch 'cleanUpLists' of https://github.com/calculuschild/marked …

26c58fd

…into cleanUpLists

Finish rebase

7d31421

vercel bot deployed to Preview August 6, 2021 04:24 View deployment

lint

823dccf

vercel bot deployed to Preview August 6, 2021 04:25 View deployment

calculuschild changed the base branch from master to dependabot/npm_and_yarn/babel/preset-env-7.14.9 August 6, 2021 04:27

calculuschild changed the base branch from dependabot/npm_and_yarn/babel/preset-env-7.14.9 to master August 6, 2021 04:27

update packaged lib

2dfda0e

vercel bot deployed to Preview August 6, 2021 04:29 View deployment

UziTech reviewed Aug 6, 2021

View reviewed changes

src/Tokenizer.js Show resolved Hide resolved

UziTech approved these changes Aug 6, 2021

View reviewed changes

UziTech changed the title ~~Full Commonmark compliance for Lists~~ fix: Full Commonmark compliance for Lists Aug 10, 2021

UziTech merged commit eb33d3b into markedjs:master Aug 10, 2021

vNext automation moved this from In Progress to Done Aug 10, 2021

calculuschild mentioned this pull request Aug 10, 2021

Inconsistent list-item sizing naturalcrit/homebrewery#1085

Closed

github-actions bot added the released label Aug 16, 2021

Snack-X mentioned this pull request Sep 14, 2022

smartLists option does not do anything #2582

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Full Commonmark compliance for Lists #2112

fix: Full Commonmark compliance for Lists #2112

calculuschild commented Jun 18, 2021 •

edited

vercel bot commented Jun 18, 2021 •

edited

UziTech commented Jun 20, 2021

calculuschild commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

UziTech commented Jun 21, 2021

calculuschild commented Jun 21, 2021 •

edited

UziTech commented Jun 21, 2021

UziTech commented Jun 21, 2021

calculuschild commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

UziTech commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

styfle Jul 5, 2021

styfle left a comment

calculuschild commented Aug 2, 2021

UziTech commented Aug 2, 2021

calculuschild commented Aug 6, 2021

github-actions bot commented Aug 16, 2021

fix: Full Commonmark compliance for Lists #2112

fix: Full Commonmark compliance for Lists #2112

Conversation

calculuschild commented Jun 18, 2021 • edited

Description

Contributor

Committer

vercel bot commented Jun 18, 2021 • edited

UziTech commented Jun 20, 2021

calculuschild commented Jun 21, 2021 • edited

calculuschild commented Jun 21, 2021 • edited

UziTech commented Jun 21, 2021

calculuschild commented Jun 21, 2021 • edited

list_align_pedantic

list_item_text

UziTech commented Jun 21, 2021

UziTech commented Jun 21, 2021

calculuschild commented Jun 21, 2021 • edited

calculuschild commented Jun 21, 2021 • edited

UziTech commented Jun 21, 2021 • edited

calculuschild commented Jun 21, 2021 • edited

calculuschild commented Jun 21, 2021 • edited

styfle Jul 5, 2021

Choose a reason for hiding this comment

styfle left a comment

Choose a reason for hiding this comment

calculuschild commented Aug 2, 2021

UziTech commented Aug 2, 2021

calculuschild commented Aug 6, 2021

github-actions bot commented Aug 16, 2021

calculuschild commented Jun 18, 2021 •

edited

vercel bot commented Jun 18, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

`list_align_pedantic`

`list_item_text`

calculuschild commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

UziTech commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited

calculuschild commented Jun 21, 2021 •

edited