Lexer handling newlines incorrectly in some cases #3167

Synthetic-Dev · 2024-01-18T22:45:34Z

Marked version:
11.1.1

Describe the bug
When using the lexer it seems to leave newlines at the end of some tokens instead of tokenizing them

To Reproduce
Input (hr):

console.log(lexer.lex("---------------------------------\n\nhi"))

Output (hr):

[
    {
        "type": "hr",
        "raw": "---------------------------------\n\n"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]

and input (blockquote):

console.log(lexer.lex("> blockquote\n\nhi"))

Output (blockquote):

[
    {
        "type": "blockquote",
        "raw": "> blockquote\n\n",
        "tokens": [
            {
                "type": "paragraph",
                "raw": "blockquote",
                "text": "blockquote",
                "tokens": [
                    {
                        "type": "text",
                        "raw": "blockquote",
                        "text": "blockquote"
                    }
                ]
            }
        ],
        "text": "blockquote"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]

For both of these examples you can see that the 2 newlines are being ignored and not tokenized by the lexer.
This is with gfm: true and breaks: true

Expected behavior
For hr input:

[
    {
        "type": "hr",
        "raw": "---------------------------------"
    },
    {
        "type": "space",
        "raw": "\n\n"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]

For blockquote input:

[
    {
        "type": "blockquote",
        "raw": "> blockquote",
        "tokens": [
            {
                "type": "paragraph",
                "raw": "blockquote",
                "text": "blockquote",
                "tokens": [
                    {
                        "type": "text",
                        "raw": "blockquote",
                        "text": "blockquote"
                    }
                ]
            }
        ],
        "text": "blockquote"
    },
    {
        "type": "br",
        "raw": "\n"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]

The text was updated successfully, but these errors were encountered:

UziTech · 2024-01-19T06:20:23Z

The space token is used in places where it is needed. For example if two paragraphs are next to each other they become one paragraph token unless there is a blank line (space token) between them.

If you want to create a PR to add space tokens after each block token that would be fine, but I think it will be a breaking change.

UziTech added the proposal label Jan 19, 2024

UziTech linked a pull request Apr 21, 2024 that will close this issue

BREAKING CHANGE: fix blockquote code continuation and add space token after #3264

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexer handling newlines incorrectly in some cases #3167

Lexer handling newlines incorrectly in some cases #3167

Synthetic-Dev commented Jan 18, 2024 •

edited

UziTech commented Jan 19, 2024

Lexer handling newlines incorrectly in some cases #3167

Lexer handling newlines incorrectly in some cases #3167

Comments

Synthetic-Dev commented Jan 18, 2024 • edited

UziTech commented Jan 19, 2024

Synthetic-Dev commented Jan 18, 2024 •

edited