Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quotes in HTML attributes escaped which breaks HTML #62

Closed
optimalisatie opened this issue Aug 25, 2020 · 3 comments
Closed

Quotes in HTML attributes escaped which breaks HTML #62

optimalisatie opened this issue Aug 25, 2020 · 3 comments

Comments

@optimalisatie
Copy link

optimalisatie commented Aug 25, 2020

Hi!

I wanted to report an issue:

JSON values of HTML attributes are rewritten to an escaped value which breaks the HTML:

<div data-json='{
    "json": "value"
}'></div>

Result of .toString():

<div data-json="{\"json\":\"value\"}"></div>

Edit

Since the goal of the HTML parser is speed, it may be best to replace JSON.stringify for HTML attributes with a simple string based value verification and leave the original value, even if it would be a mere space or empty string, intact. It could save 50,000+ JSON.stringify calls for some HTML documents.

For some attributes or Javascript functionality it does matter if the attribute contains ="". Stripping it would cost parsing resources while it seems to provide no other advantage than HTML compression, which does not seem to be a goal of the HTML parser.

The following example may provide a hint for a solution:

// Update rawString
const quoteRegex = /"/g; // re-use

this.rawAttrs = Object.keys(attrs).map(function(name) {
    var val = attrs[name];
    if (val === undefined) { // not a string
        return name;
    } else {
        return name + '="' + val.replace(quoteRegex, '&#34;') + '"';
    }
}).join(' ');
@taoqf
Copy link
Owner

taoqf commented Sep 22, 2020

Sorry I am afraid this would lead to other errors. you can fork this lib and run npm test if you would see the errors.
and, pr is welcomed.
Anyway, thank you for your support.

@taoqf taoqf closed this as completed Sep 22, 2020
lamplightdev added a commit to lamplightdev/node-html-parser that referenced this issue May 20, 2021
Currently double quotes (") are not escaped in attribute values causing those attributes not to be set correctly.

This commit replaces double quotes with `&quot;`.
@lamplightdev
Copy link

I've come across this issue too, and it can be solved by replacing double quotes (") with &quot; in attribute values. I've submitted a PR that implements this and adds new tests.

@taoqf
Copy link
Owner

taoqf commented May 22, 2021

@lamplightdev Thank you.

taoqf added a commit that referenced this issue May 22, 2021
Fixes #62 - Escape double quotes in attribute values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants