Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MDX comment syntax #1368

Open
3w36zj6 opened this issue Mar 12, 2024 · 6 comments
Open

Add support for MDX comment syntax #1368

3w36zj6 opened this issue Mar 12, 2024 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@3w36zj6
Copy link
Member

3w36zj6 commented Mar 12, 2024

What version of textlint are you using?

v14.0.3

What file type (Markdown, plain text, etc.) are you using?

MDX via @textlint/textlint-plugin-markdown

What did you do? Please include the actual source code causing the issue.

The comment syntax in Markdown is the same as in HTML, but MDX does not support this comment syntax and instead uses {/* */}.

@textlint/textlint-plugin-markdown does not recognize this as a comment, and to treat it as a comment syntax, must do as follows:

{/* <!-- --> */}

What did you expect to happen?

I expect the MDX comment syntax to be supported and that the range enclosed by the MDX comment syntax would not be subject to linting.

What actually happened? Please include the actual, raw output from textlint.

The range enclosed by the MDX comment syntax is subject to linting.

@azu
Copy link
Member

azu commented Mar 12, 2024

I think the best way to make a textlint-plugin-mdx.
Actually, MDX and Markdown are not completely compatible.

textlint-plugin-markdown treat <p>{/* .... */}</p> as HTML fragments.

image

https://textlint.github.io/astexplorer/#/snippet/woXCqHBhcnNlcklEwrh0ZXh0bGludDptxINrZG93bi10by1hc3TCqMSFdHTEkGdzwoHEisSMxI7EkMSSxJRyxJbEmMSaxJzEnsSgw4DCqHbEhnNpb27EqMSqxI3Ej8SRxJPElcSXxJnEm8SdxJ90wqYxNC4wLjPCqGZpbGVuYW1lwrBzb3VyY2UudW5kZcWUbmVkwqRjb8Wmwr88ZGl2PgogIHsvKiA8IS0tIMW_PiAqL30KPC_FssW0}

textlint-filter-rule-comments extract HTML comment from the fragment and filter it.

https://github.com/textlint/textlint-filter-rule-comments/blob/755c4402cc4995cd7ca37764c138840811d53515/src/textlint-filter-rule-comments.js#L40-L55

If textlint-plugin-mdx parse <p>{/* .... */}</p> as <HTML><Comment /></HTML>, textlint-filter-rule-comments just works without workaround.

However, It's hard to make a plugin for MDX, so there may be a better way.

@azu azu added the help wanted Extra attention is needed label Mar 12, 2024
@azu
Copy link
Member

azu commented Mar 12, 2024

I think it is possible to improve the regexp here.

https://github.com/textlint/textlint-filter-rule-comments/blob/755c4402cc4995cd7ca37764c138840811d53515/src/parse-comment.js#L3

However, it is necessary to verify a little whether there is false-negative.
(I don't think many users will write such comments {/* textlint-disable */} unexpectlly, There is no need to worry.)

Actually, {/* */} is not a comment in HTML, so I don't know if we should change this regexp.

@3w36zj6
Copy link
Member Author

3w36zj6 commented Mar 12, 2024

Indeed, my goal is to use textlint-filter-rule-comments with MDX comment syntax only. Therefore, I considered making the comment format customizable from .textlintrc (3w36zj6/textlint-filter-rule-comments@8ae4e5a) or making the regular expression representing the comment syntax a union of Markdown and MDX (3w36zj6/textlint-filter-rule-comments@45c4e5b), but neither gave me the desired result.

@azu
Copy link
Member

azu commented Mar 13, 2024

Probabely, 3w36zj6/textlint-filter-rule-comments@45c4e5b reuse g ann it is something wrong.

Maybe, we can create a RegeExp dynamically like this.

            const regExp = new RegExp([
                String.raw`<!--\s*${enablingComment}*(?:.|\s)*?-->`,
                String.raw`<!--\s*${disablingComment}*(?:.|\s)*?-->`,
                String.raw`{/\*\s*${enablingComment}*(?:.|\s)*?\*/}`,
                String.raw`{/\*\s*${disablingComment}*(?:.|\s)*?\*/}`
            ].join("|"), "g");
            const matches = nodeValue.matchAll(regExp);
            const comments = Array.from(matches, m => {
                return m[0].replace(/<!--|-->|{\/\*|\*\/}/g, "")
            });
            comments.forEach(commentValue => {
                if (commentValue.indexOf(enablingComment) !== -1) {
                    const configValue = commentValue.replace(enablingComment, "");
                    statusManager.enableReporting(node, parseRuleIds(configValue));
                } else if (commentValue.indexOf(disablingComment) !== -1) {
                    const configValue = commentValue.replace(disablingComment, "");
                    statusManager.disableReporting(node, parseRuleIds(configValue));
                }
            });

@3w36zj6
Copy link
Member Author

3w36zj6 commented Mar 13, 2024

I have reverted the changes to HTML_COMMENT_REGEXP and implemented the suggested changes (3w36zj6/textlint-filter-rule-comments@5ec2076).
All tests in Markdown pass, but in MDX, comments continue to be subject to lint, and rule disabling is not performed.

@3w36zj6
Copy link
Member Author

3w36zj6 commented Mar 25, 2024

Thank you for your continuous support. We considered creating mdx-to-ast and textlint-plugin-mdx and conducted the following simple experiments.

First, I added remark-mdx to the Processor.

textlint/packages/@textlint/markdown-to-ast/src/parse-markdown.ts:

import unified from "unified";
// @ts-ignore
import autolinkLiteral from "mdast-util-gfm-autolink-literal/from-markdown";
// FIXME: Disable auto link literal transforms that break AST node
// https://github.com/remarkjs/remark-gfm/issues/16
// Need to override before import gfm plugin
autolinkLiteral.transforms = [];
// Load plugins
import remarkGfm from "remark-gfm";
+import remarkMdx from "remark-mdx";
import remarkParse from "remark-parse";
import frontmatter from "remark-frontmatter";
import footnotes from "remark-footnotes";
import type { Node } from "unist";

-const remark = unified().use(remarkParse).use(frontmatter, ["yaml"]).use(remarkGfm).use(footnotes, {
+const remark = unified().use(remarkParse).use(frontmatter, ["yaml"]).use(remarkGfm).use(remarkMdx).use(footnotes, {
    inlineNotes: true
});
export const parseMarkdown = (text: string): Node => {
    return remark.parse(text);
};

Next, I added processing for MDX-specific mdast nodes.

textlint/packages/@textlint/markdown-to-ast/src/index.ts:

import { SyntaxMap } from "./mapping/markdown-syntax-map";
import type { TxtDocumentNode } from "@textlint/ast-node-types";
import { ASTNodeTypes } from "@textlint/ast-node-types";
import traverse from "traverse";
import debug0 from "debug";
import { parseMarkdown } from "./parse-markdown";

const debug = debug0("@textlint/markdown-to-ast");

export { ASTNodeTypes as Syntax };

/**
 * parse Markdown text and return ast mapped location info.
 * @param {string} text
 */
export function parse(text: string): TxtDocumentNode {
    // remark-parse's AST does not consider BOM
    // AST's position does not +1 by BOM
    // So, just trim BOM and parse it for `raw` property
    // textlint's SourceCode also take same approach - trim BOM and check the position
    // This means that the loading side need to consider BOM position - for example fs.readFile and text slice script.
    // https://github.com/micromark/micromark/blob/0f19c1ac25964872a160d8b536878b125ddfe393/lib/preprocess.mjs#L29-L31
    const hasBOM = text.charCodeAt(0) === 0xfeff;
    const textWithoutBOM = hasBOM ? text.slice(1) : text;
    const ast = parseMarkdown(textWithoutBOM);
    traverse(ast).forEach(function (node) {
        // eslint-disable-next-line no-invalid-this
+        const isMdx =
+            node &&
+            Object.prototype.hasOwnProperty.call(node, "type") &&
+            /^(mdxJsxFlowElement|mdxJsxTextElement|mdxjsEsm|mdxFlowExpression|mdxTextExpression)$/.test(node.type);
        if (this.notLeaf) {
+            if (isMdx) {
+                if (node.type === "mdxJsxFlowElement") {
+                    node.type = ASTNodeTypes.Html;
+                } else if (node.type === "mdxJsxTextElement") {
+                    node.type = ASTNodeTypes.Paragraph;
+                } else if (
+                    node.type === "mdxjsEsm" ||
+                    node.type === "mdxFlowExpression" ||
+                    node.type === "mdxTextExpression"
+                ) {
+                    if (Object.prototype.hasOwnProperty.call(node, "value") && /^(\/\*(.|\s)*\*\/)$/.test(node.value)) {
+                        node.type = ASTNodeTypes.Comment;
+                        node.value = node.value.replace(/^(\/\*)/, "").replace(/\*\/$/, "");
+                    } else {
+                        node.type = node.type === "mdxTextExpression" ? ASTNodeTypes.Code : ASTNodeTypes.CodeBlock;
+                    }
+                }
+                if (Object.prototype.hasOwnProperty.call(node, "attributes")) {
+                    delete node.attributes;
+                }
+                if (Object.prototype.hasOwnProperty.call(node, "name")) {
+                    delete node.name;
+                }
+                if (Object.prototype.hasOwnProperty.call(node, "data")) {
+                    delete node.data;
+                }
-            if (node.type) {
+            } else if (node.type) {
                const replacedType = SyntaxMap[node.type as keyof typeof SyntaxMap];
                if (!replacedType) {
                    debug(`replacedType : ${replacedType} , node.type: ${node.type}`);
                } else {
                    node.type = replacedType;
                }
            }
            // map `range`, `loc` and `raw` to node
            if (node.position) {
                const position = node.position;
                // line start with 1
                // column start with 0
                const positionCompensated = {
                    start: { line: position.start.line, column: Math.max(position.start.column - 1, 0) },
                    end: { line: position.end.line, column: Math.max(position.end.column - 1, 0) }
                };
                const range = [position.start.offset, position.end.offset] as const;
                node.loc = positionCompensated;
                node.range = range;
                node.raw = textWithoutBOM.slice(range[0], range[1]);
                // Compatible for https://github.com/syntax-tree/unist, but it is hidden
                Object.defineProperty(node, "position", {
                    enumerable: false,
                    configurable: false,
                    writable: false,
                    value: position
                });
            }
        }
    });
    return ast as TxtDocumentNode;
}

This approach will likely succeed. However, the latest remark-mdx only supports ESM, so I believe updates to configuration files and dependencies will be necessary before proceeding.

(I updated the configuration files and dependencies and confirmed operation on local environment, I could not figure out how to make both the TypeScript transpile and the test with mocha succeed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants