Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML character entities throw "Expected valid tag name" error #296

Open
Tracked by #568
vwkd opened this issue Aug 28, 2021 · 12 comments
Open
Tracked by #568

HTML character entities throw "Expected valid tag name" error #296

vwkd opened this issue Aug 28, 2021 · 12 comments
Labels
assigned Whether or not this bug has been assigned some to some other issues as a subtask or pre-req bug Something isn't working
Milestone

Comments

@vwkd
Copy link

vwkd commented Aug 28, 2021

Description

When using HTML character entities like < in a MDsveX page with SvelteKit, Vite throws an Expected valid tag name error

(Note, before Vite 2.5.4 and SvelteKit 1.0.0-next.165 the error was a cryptic offset is longer than source length!. Thanks to vitejs/vite#4782 this is now fixed.)

Reproduction

https://github.com/vwkd/sveltekit-bugs/tree/mdsvex-character-references

  1. Run npm run dev
  2. Observe error Expected valid tag name

System Info

  Binaries:
    Node: 16.6.0 - ~/.nvm/versions/node/v16.6.0/bin/node
    Yarn: 1.22.11 - ~/.nvm/versions/node/v16.6.0/bin/yarn
    npm: 7.19.1 - ~/.nvm/versions/node/v16.6.0/bin/npm
  npmPackages:
    @sveltejs/adapter-static: ^1.0.0-next.11 => 1.0.0-next.18
    @sveltejs/kit: next => 1.0.0-next.165
    mdsvex: ^0.9.8 => 0.9.8 
    svelte: ^3.34.0 => 3.42.5

Additional context

Maybe related to sveltejs/svelte#6440 ?

@vwkd vwkd changed the title HTML character references throw "offset is longer than source length" error HTML character entities throw "offset is longer than source length" error Aug 28, 2021
@vwkd
Copy link
Author

vwkd commented Aug 28, 2021

Ok, I might have found a possible cause. After testing multiple pages with such symbols, I sometimes got the more specific error Expected valid tag name. Looking at the error message, the character entities were converted. It seems like they were interpreted as HTML, which in the case of < and >would lead to the invalid tag. This seems like it is a bug.

Though there might be more causes to the "offset is longer than source length" error. I also got it when writing some LaTeX on a page, e.g.\{0,1,\ldots,9\}. Sometimes I got the more specific error Expecting Unicode escape sequence \uXXXX. Of course, LaTeX is invalid markup and an error isn't a bug here, but again the cryptic "offset is longer than source length" doesn't help much.

Both examples seems to highlight the same issue that invalid markup isn't reported properly as error, then it progresses down the chain until it eventually ends up being served by Vite which throws the cryptic "offset is longer than source length" error.

Could it be that remark fails to throw an error during parsing, or that MDsveX swallows it?

@milahu
Copy link

milahu commented Aug 29, 2021

one problem is that vite throws these usless stack traces

Error: offset is longer than source length!
    at numberToPos (node_modules/vite/dist/node/chunks/dep-972722fa.js:4247:15)
    at formatError (node_modules/vite/dist/node/chunks/dep-972722fa.js:51169:24)
    at TransformContext.error (node_modules/vite/dist/node/chunks/dep-972722fa.js:51149:19)
    at Object.transform (node_modules/vite/dist/node/chunks/dep-972722fa.js:51357:25)
    at async transformRequest (node_modules/vite/dist/node/chunks/dep-972722fa.js:67098:29)
    at async instantiateModule (node_modules/vite/dist/node/chunks/dep-972722fa.js:73732:10)

with vitejs/vite#4782 i get

formatError: error in numberToPos (Error: offset is longer than source length! offset 126 > length 48) while handling the error:
Expected valid tag name
4: </script>
5: 
6: <p>Lorem ipsum < dolor sit</p>
                   ^
ParseError: Expected valid tag name
    at error (node_modules/svelte/compiler.mjs:17698:19)
    at Parser$1.error (node_modules/svelte/compiler.mjs:17774:9)
    [...]

@milahu
Copy link

milahu commented Aug 29, 2021

digging deeper

node_modules/mdsvex/dist/main.cjs.js
// line 8722

        if (namedEntity) {
          entityCharacters = characters;
          entity = namedEntity;
        }

        // DEBUG
        console.log(`\n\nmdsvex cjs 8727: found entity in ${characters}. named ${namedEntity}. type ${type}`);
        //console.dir(new Error());
// line 8815

      // Found it!
      // First eat the queued characters as normal text, then eat an entity.
      if (reference) {
        flush();

        prev = now();
        index = end - 1;
        column += end - start + 1;
        result.push(reference);

        // DEBUG
        console.log(`mdsvex cjs 8827: push reference ${reference}`);

output

mdsvex cjs 8727: found entity in lt. named <. type named
mdsvex cjs 8827: push reference <

where does this code come from? lets ask github - looks like parse-entities
some traces later, this is called by remark-parse.markdown

so the problem is ...

	const toMDAST = unified()
		.use(markdown) // HERE

simple but wrong:
escaping &lt; with &amp;lt; and &gt; with &amp;gt; would give false positives
for example

&lt;
<div title="&lt;"/>

would become

&lt;
<div title="&amp;lt;"/>

so either set options via .use(markdown, options) or patch the markdown parser

@milahu
Copy link

milahu commented Aug 29, 2021

quickfix

// cjs js line 8716

      if (terminated) {
        end++;

        namedEntity = type === name$1 ? decodeEntity_1(characters) : false;

        // QUICKFIX
        if (namedEntity == '<') namedEntity = '&lt;';
        if (namedEntity == '>') namedEntity = '&gt;';

        if (namedEntity) {
          entityCharacters = characters;
          entity = namedEntity;
        }

much faster than fixing all the dependencies : )
how to apply with patch-package

git clone --depth 1 https://github.com/milahu/sveltekit-bugs.git --branch quickfix-with-patch-package
cd sveltekit-bugs
pnpm install
npm run dev

@pngwn
Copy link
Owner

pngwn commented Sep 10, 2021

Late coming to this but I don't think < or > are really supported in the underlying parsers, entities seem to be decoded when parsed.

I think there are two issues about this now, can you check to see if this issue is reporting the same problem?

@milahu
Copy link

milahu commented Sep 10, 2021

can you check to see if this issue is reporting the same problem?

yes, its a duplicate

source here

Lorem ipsum &lt; dolor sit

source there

A paragraph started with &lt;h2&gt; 

with my quickfix, this is correctly rendered as

A paragraph started with <h2>

@pngwn
Copy link
Owner

pngwn commented Sep 10, 2021

Thank you!

In v1, I have a plan to make the use of <, > character entities unnecessary, supporting < and > directly in text. But the entity situation will be resolve when mdsvex moves to a different parser.

@pngwn
Copy link
Owner

pngwn commented Sep 10, 2021

This issue has more information, I'll close the other.

@pngwn pngwn added bug Something isn't working markdown labels Sep 10, 2021
@vwkd vwkd changed the title HTML character entities throw "offset is longer than source length" error HTML character entities throw "Expected valid tag name" error Sep 12, 2021
@milahu
Copy link

milahu commented Sep 17, 2021

when mdsvex moves to a different parser.

in the mean time, you could use patch-package to patch the dependency remark-parse.markdown
so that your consumers dont need to patch mdsvex

@vwkd
Copy link
Author

vwkd commented Sep 26, 2021

@milahu Thanks you for digging into this and also for fixing the error message in Vite!

I'm thinking it may be a good idea to submit a PR upstream to remark-parse.markdown. I have the feeling it will take some time for MDsveX to move to a new parser, so it's waiting either way. Fixing upstream would be the right thing to do even if we have to wait more. This way MDsveX doesn't need to maintain a custom patch and also other projects would benefit from it.

EDIT: Are you sure it's remark-parse? It seems it just a wrapper around mdast-util-from-markdown. Looking into mdast-util-from-markdown, it has a single reference to parse-entities at https://github.com/syntax-tree/mdast-util-from-markdown/blob/c79a430b8f6b0e3aa4c0981459510283ffe037a0/dev/lib/index.js#L960. Might that be it?

@milahu
Copy link

milahu commented Sep 26, 2021

Might that be it?

yepp

$ cd MDsveX/packages/mdsvex
$ sed -i 's/sourcemap: false/sourcemap: true/' rollup.config.js
$ npm run build
$ npm i -g source-map-cli
$ source-map resolve dist/main.cjs.js.map 8722 0
Maps to node_modules/parse-entities/index.js:254:0 

        if (namedEntity) {
^

thats the problem with fixing upstream:
the code is hidden in some dependency, and we would have to pass our option across all the interfaces

do you have so much time to waste? ; )

@pngwn
Copy link
Owner

pngwn commented Sep 26, 2021

I'm not sure an upstream PR would be accepted anyway, I'm pretty sure this is by design from previous conversations I've had.

I'll Patch this dependency for the short term when I find the time (if anyone wants to submit a PR, I'll happily accept it).

@pngwn pngwn mentioned this issue Feb 23, 2024
27 tasks
@pngwn pngwn added the assigned Whether or not this bug has been assigned some to some other issues as a subtask or pre-req label Feb 23, 2024
@pngwn pngwn added this to the 1.0 milestone Feb 23, 2024
@pngwn pngwn removed v1 labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned Whether or not this bug has been assigned some to some other issues as a subtask or pre-req bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants