Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Parsing of Markdown with Special Characters Adjacent to Strong Tags in Non-English Contexts #1288

Closed
4 tasks done
YvesRijckaert opened this issue Feb 13, 2024 · 2 comments
Labels
👀 no/external This makes more sense somewhere else 👎 phase/no Post cannot or will not be acted on

Comments

@YvesRijckaert
Copy link

YvesRijckaert commented Feb 13, 2024

Initial checklist

Affected packages and versions

"remark-gfm": "^4.0.0", "remark-parse": "^11.0.0", "unified": "^11.0.4"

Link to runnable example

No response

Steps to reproduce

  • Use remark-parse, remark-gfm, and unified packages to parse markdown content.
const processor = unified().use(remarkParse).use(remarkGfm);
const tree = processor.parse(md);
console.log(tree.children);
  • Parse these two strings to see the different behaviour:
const markdownWithSpecialCharacter = '**test:**如李施德林';
const markdownWithoutSpecialCharacter = '**test**如李施德林';
  • Observe that the raw markdown (**) is displayed instead of actual bold/strong text in the output, when there is a special character (here: :).

markdownWithSpecialCharacter:

[
  {
    "type": "paragraph",
    "children": [
      {
        "type": "text",
        "value": "**test:**如李施德林",
        "position": {
          "start": { "line": 1, "column": 1, "offset": 0 },
          "end": { "line": 1, "column": 15, "offset": 14 }
        }
      }
    ],
    "position": {
      "start": { "line": 1, "column": 1, "offset": 0 },
      "end": { "line": 1, "column": 15, "offset": 14 }
    }
  }
]

markdownWithoutSpecialCharacter:

[
  {
    "type": "paragraph",
    "children": [
      {
        "type": "strong",
        "children": [
          {
            "type": "text",
            "value": "test",
            "position": {
              "start": { "line": 1, "column": 3, "offset": 2 },
              "end": { "line": 1, "column": 7, "offset": 6 }
            }
          }
        ],
        "position": {
          "start": { "line": 1, "column": 1, "offset": 0 },
          "end": { "line": 1, "column": 9, "offset": 8 }
        }
      },
      {
        "type": "text",
        "value": "如李施德林",
        "position": {
          "start": { "line": 1, "column": 9, "offset": 8 },
          "end": { "line": 1, "column": 14, "offset": 13 }
        }
      }
    ],
    "position": {
      "start": { "line": 1, "column": 1, "offset": 0 },
      "end": { "line": 1, "column": 14, "offset": 13 }
    }
  }
]

Expected behavior

The markdown content should be parsed correctly, with the strong tag (**) properly converting text to bold, regardless of the presence of special characters or non-English text adjacent to the tags.

Actual behavior

  • When a colon character is present next to a strong tag with non-English text following it, the markdown syntax is not correctly parsed, and the raw markdown is displayed.
  • When non-English text is directly next to a strong tag without any special characters, the markdown is parsed correctly, and the text is bolded as expected.

Runtime

Node v17

Package manager

npm 8

OS

macOS

Build and bundle tools

Rollup

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Feb 13, 2024
@wooorm
Copy link
Member

wooorm commented Feb 13, 2024

https://spec.commonmark.org/dingus/?text=**test%3A**如李施德林

commonmark/commonmark-spec#650 (welcome to help come up with more real world examples here)

@wooorm wooorm closed this as completed Feb 13, 2024
@wooorm wooorm added the 👀 no/external This makes more sense somewhere else label Feb 13, 2024

This comment has been minimized.

@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👀 no/external This makes more sense somewhere else 👎 phase/no Post cannot or will not be acted on
Development

No branches or pull requests

2 participants