Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-running ammonia on its own output gives different results #185

Open
bcaller opened this issue Aug 8, 2023 · 0 comments
Open

Re-running ammonia on its own output gives different results #185

bcaller opened this issue Aug 8, 2023 · 0 comments

Comments

@bcaller
Copy link

bcaller commented Aug 8, 2023

There are a few cases I've found where feeding the output of ammonia back into ammonia gives a different output.

I'm not sure if this means that the initial output is non-compliant or potentially unsafe or if you'd consider this not a bug.

Also it's possible the bug is entirely within html5ever, I'm not sure.

The first two examples are that entity decoding sometimes produces characters we want to remove or change in the second pass.

The later examples show the sanitizer wanting to move closing tags around.

Anyway, do you think it's worth running ammonia twice, or it's nothing to worry about?


HTML entity -> \r -> \n


\r
\n

HTML entity for BOM at start -> BOM at start -> nothing
(OK this one I understand because we use the default TokenizerOpts with discard_bom)

&#65279!
\ufeff!
!

Anchor tag hopping around:

<a><table><a>
<a><a></a><table></table></a>
<a></a><a></a><table></table>
<h1><a><h6></a></h6>
<h1><a></a><h6><a></a></h6></h1>
<h1><a></a></h1><h6><a></a></h6>

Paragraph tags reproducing:

<p><svg><foreignobject><p>
<p><p></p></p>
<p></p><p></p><p></p>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant