-
-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] HTML5 document encoding differs from HTML4 #2801
Comments
There are a couple of things worth digging into here:
|
Since Gumbo doesn't support anything other than UTF-8, it performs the standard encoding detection pre-scan that browsers are supposed to perform to decide on the encoding and then uses that to convert to UTF-8 to pass to Gumbo. I'm wasn't sure what the I'm not sure how this is supposed to interact with Basically, I don't know what the correct behavior around encodings is supposed to be but since Gumbo only supports UTF-8, I punted on it. |
@stevecheckoway Ah, that's really helpful! Now I'm remembering that we briefly talked about this at #2513 |
Please describe the bug
The encoding of an HTML5 document differs from the encoding of an HTML4 document:
I haven't had time to dig into why this is (and whether it's intended behavior), so I'm opening this issue to look into it later. cc @stevecheckoway
Help us reproduce what you're seeing
yields
Expected behavior
I think these should both be the same?
The text was updated successfully, but these errors were encountered: