Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines in styles result in invalid html #237

Closed
allard opened this issue May 13, 2024 · 3 comments
Closed

Newlines in styles result in invalid html #237

allard opened this issue May 13, 2024 · 3 comments

Comments

@allard
Copy link

allard commented May 13, 2024

I recently had an issue where user input html had a newline in an inline style. This passed through sanitize with the newline still there and later resulted in invalid parsing through BootstrapEmail. I've solved it with a transformer but feel that this should probably be included in the standard sanitzing.

Issue:

Sanitize.fragment(%{<p style="margin-bottom:\r\n 2.0pt;">text</p>}, Sanitize::Config::RELAXED) => 
"<p style=\"margin-bottom:\n 2.0pt;\">text</p>"

Fix:

sanitize_newlines_from_style = lambda do |env|
  return unless (node = env[:node]).element?
  return unless node.attribute_nodes.present?
  node.attribute_nodes.each do |node_attribute|
    next unless node_attribute.name == "style"
    node_attribute.value = node_attribute.value.gsub(/[\r\n]/, '')
  end
end
Sanitize.fragment(%{<p style="margin-bottom:\r\n 2.0pt;">text</p>},
  Sanitize::Config.merge(Sanitize::Config::RELAXED,
    transformers: sanitize_newlines_from_style)) => 
"<p style=\"margin-bottom: 2.0pt;\">text</p>"
@flavorjones
Copy link
Contributor

Hi! I'm not the maintainer, but I'm another user of @rgrove's tools Sanitize and Crass and wanted to ask a couple of clarifying questions ...

  1. What's the error you're getting from your tool around this property? My reading of the CSS spec leads me to believe that this is valid syntax -- and Crass seems to have no problem parsing it.
  2. Do you think this falls under the category of "security" or "safety"? My experience has been that a linter checks for validity (and possibly fixes errors) while a sanitizer simply needs to make sure that untrusted inputs are made safe.

@allard
Copy link
Author

allard commented May 14, 2024

BootstrapEmail raised a 500 error with this code. Now, it could be that this issue technically lies with them.

When I ran the code through the W3c CSS validator div { margin-bottom: \n 2.0pt; }, it removed the newline and said it was valid. When I ran it through the W3C HTML validator, it gave the following error:

Error: CSS: margin-bottom: Too many values or values are not recognized.
[From line 1, column 1; to line 1, column 35](https://validator.w3.org/nu/#l1c35)
<p style="margin-bottom:\n 2.0pt;">

Maybe I misunderstand the purpose of sanitize, I've always thought it was to ensure both valid and safe data, which is why the following example returns valid html. Or do you mean that it shouldn't and there should be a linter needed as well?

Sanitize.fragment("<div>text", Sanitize::Config::RELAXED)
=> "<div>text</div>"

@rgrove
Copy link
Owner

rgrove commented May 15, 2024

Hi @allard! As @flavorjones mentioned, newlines are valid in HTML style attributes, so Sanitize appears to behaving correctly. Relevant standards can be found here:

It sounds like BootstrapEmail may not handle this case correctly, but it's great that you were able to work around that problem by creating a custom transformer (that's exactly what transformers are for!).

Maybe I misunderstand the purpose of sanitize, I've always thought it was to ensure both valid and safe data, which is why the following example returns valid html. Or do you mean that it shouldn't and there should be a linter needed as well?

Sanitize isn't a validator or linter and can't guarantee valid output. But you're correct that Sanitize does aim to guarantee safe output according to the config that's used.

The HTML parsing standard specifies certain rules for parsing HTML that can result in invalid HTML being "fixed" by the parser, and when that parsed HTML is then serialized and returned by Sanitize, this can give the impression that Sanitize is acting as a validator. But Sanitize itself doesn't perform any validation, and the only changes it makes to HTML or CSS are changes that are intended to make it safe.

@rgrove rgrove closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants