-
-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitise HTML in long description of enterprise #12459
base: master
Are you sure you want to change the base?
Conversation
This happens only on assignment. We still need to migrate existing data.
Except, there are none. So this commit is empty. I wanted to show that this was the plan though. I guess that I just closed a security issue.
app/services/html_sanitizer.rb
Outdated
# We offer an editor which supports certain tags but you can't insert just any | ||
# HTML, which would be dangerous. | ||
class HtmlSanitizer | ||
def self.sanitize(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be providing default allowed tags and attributes ? I makes to be consistent across the code base, and if you need something different that the default, you can then needed pass arguments. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be good. At the moment, we use different editors and have different tags and attributes. I thought that we should first bring the common sanitiser in for all places and then standardise. But now I'm thinking that it's okay to sanitise with the union of all tags and attributes instead of allowing only the supported tags for each attribute. You are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for thinking about the big picture and solving it the safest and most efficient way.
I think it's worth considering adding the sanitisation to the view first, and "testing" out in prod for a couple of weeks, before we make destructive changes though. I don't know what HTML people currently have, and what workarounds they might be employing to format their about pages. Or am I being overly cautious?
app/models/enterprise.rb
Outdated
# Remove any unsupported HTML. | ||
def long_description=(html) | ||
super(HtmlSanitizer.sanitize( | ||
html, tags: %w[h1 h2 h3 h4 p b i u a], attributes: %w[href target], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that Trix supports some other tags, and actually uses strong
and em
instead of b
and i
. Ugh, and so many br
s:
https://staging.openfoodnetwork.org.au/elmore-compost-and-organics/shop
<p class="text-small ng-binding" data-controller="add-blank-to-link" ng-bind-html="::product.description_html">
<div>
Biochar is a porous charcoal well recognised for being able to hold water, create great building blocks for biology and a store of plant carbon.<br><br><br>
<strong>
bold
</strong>
<br>
<em>italic</em>
<br>
<del>
strikethrough
</del>
<br>
<a href="https://staging.openfoodnetwork.org.au/" target="_blank">
link
</a>
</br>
</br>
</br>
</br>
</br>
</br>
</div>
<h1>
heading
</h1>
<blockquote>
quote
</blockquote>
<pre>code</pre>
<ul>
<li>
bullet list
<ul>
<li>
indented
</li>
</ul>
</li>
</ul>
<ol>
<li>
number list
<ol>
<li>
indented
</li>
</ol>
</li>
</ol>
<div>
<br>
horizontal rule:
<hr>
<br>
Pasted tables get reformatted by Trix I guess:
<br>
<br>
<strong>
CompanyContactCountry
</strong>
<br>
Alfreds Futterkiste | Maria Anders | Germany
<br>
Centro comercial Moctezuma | Francisco Chang | Mexico
<br>
Ernst Handel | Roland Mendel | Austria
<br>
Island Trading | Helen Bennett | UK
<br>
Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada
<br>
Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</hr>
</br>
</div>
</p>
I would err on the side of being more permissive (of safe elements) and allow the other tags in case Trix allows pasting of other html, or we use a different editor one day. Is there a comprehensive list we could copy from somewhere? If not, can we add all the above examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I just realised that the Enterprise description uses a different editor!
So your list here should be enough (though it would be worth checking for br
).
Still, if we extend this to products we'd need to expand the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a comprehensive list we could copy from somewhere?
Well, when we omit our list of safe tags then Rails comes with a comprehensive list already. We could use that to allow more pasted HTML. But then we have a discrepancy of pasted vs edited. And when it's about security, I usually err on the side of caution and be more restrictive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the full standard list: https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb
It would allow certain CSS as well. I don't think it's all safe in our case, it could mess up the layout or maybe more.
app/services/html_sanitizer.rb
Outdated
# We offer an editor which supports certain tags but you can't insert just any | ||
# HTML, which would be dangerous. | ||
class HtmlSanitizer | ||
def self.sanitize(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 👍
def up | ||
Enterprise.where.not(long_description: [nil, ""]).find_each do |enterprise| | ||
enterprise.update!(long_description: sanitize(long_description)) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is un-reversable of course, so I think it's a bit risky (what if we messed up formatting for lots of enterprises).
Maybe it would be fine, and I don't want to make more effort unnecessarily, but would consider first sanitising the view and wait until it's been released for a couple of weeks before applying the migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We should at least test it first. The br
tag is a good example for something I forgot and where I'm not sure it's covered.
which project should this be coded against @mkllnk |
ℹ️ Please use project Discover Regenerative (Macdoch pt 2): 3. Open Source Tech Evolution to track work on this issue.
What? Why?
We weren't sanitising the HTML of long enterprise descriptions at all. So here's a remedy that sanitises the field before it's reaching the database. All consumers, including API users should now get sanitised HTML.
There are more attributes like this listed in the issue. But I wanted to get this through faster and get a review first before I apply the same approach to the other three fields.
What should we test?
Release notes
Changelog Category (reviewers may add a label for the release notes):
The title of the pull request will be included in the release notes.
Dependencies
Documentation updates