Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitise HTML in long description of enterprise #12459

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mkllnk
Copy link
Member

@mkllnk mkllnk commented May 10, 2024

ℹ️ Please use project Discover Regenerative (Macdoch pt 2): 3. Open Source Tech Evolution to track work on this issue.

What? Why?

We weren't sanitising the HTML of long enterprise descriptions at all. So here's a remedy that sanitises the field before it's reaching the database. All consumers, including API users should now get sanitised HTML.

There are more attributes like this listed in the issue. But I wanted to get this through faster and get a review first before I apply the same approach to the other three fields.

What should we test?

  • Visit Admin, Enterprise settings, About and edit the long description with the editor.
  • Everything you do with the editor should save and show in the frontend.
  • I don't know an easy way to simulate the attacker scenario. But I think that my approach with those unit tests is pretty safe.

Release notes

Changelog Category (reviewers may add a label for the release notes):

  • User facing changes
  • API changes (V0, V1, DFC or Webhook)
  • Technical changes only
  • Feature toggled

The title of the pull request will be included in the release notes.

Dependencies

Documentation updates

This happens only on assignment. We still need to migrate existing data.
Except, there are none. So this commit is empty. I wanted to show that
this was the plan though. I guess that I just closed a security issue.
@mkllnk mkllnk marked this pull request as ready for review May 10, 2024 05:53
# We offer an editor which supports certain tags but you can't insert just any
# HTML, which would be dangerous.
class HtmlSanitizer
def self.sanitize(*args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be providing default allowed tags and attributes ? I makes to be consistent across the code base, and if you need something different that the default, you can then needed pass arguments. What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be good. At the moment, we use different editors and have different tags and attributes. I thought that we should first bring the common sanitiser in for all places and then standardise. But now I'm thinking that it's okay to sanitise with the union of all tags and attributes instead of allowing only the supported tags for each attribute. You are right.

Copy link
Member

@dacook dacook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for thinking about the big picture and solving it the safest and most efficient way.

I think it's worth considering adding the sanitisation to the view first, and "testing" out in prod for a couple of weeks, before we make destructive changes though. I don't know what HTML people currently have, and what workarounds they might be employing to format their about pages. Or am I being overly cautious?

# Remove any unsupported HTML.
def long_description=(html)
super(HtmlSanitizer.sanitize(
html, tags: %w[h1 h2 h3 h4 p b i u a], attributes: %w[href target],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that Trix supports some other tags, and actually uses strong and em instead of b and i. Ugh, and so many brs:

Screenshot 2024-05-13 at 11 18 09 am

https://staging.openfoodnetwork.org.au/elmore-compost-and-organics/shop

<p class="text-small ng-binding" data-controller="add-blank-to-link" ng-bind-html="::product.description_html">
<div>
   Biochar is a porous charcoal well recognised for being able to hold water, create great building blocks for biology and a store of plant carbon.<br><br><br>
   <strong>
   bold
   </strong>
   <br>
   <em>italic</em>
   <br>
   <del>
   strikethrough
   </del>
   <br>
   <a href="https://staging.openfoodnetwork.org.au/" target="_blank">
   link
   </a>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
</div>
<h1>
   heading
</h1>
<blockquote>
   quote
</blockquote>
<pre>code</pre>
<ul>
   <li>
      bullet list
      <ul>
         <li>
            indented
         </li>
      </ul>
   </li>
</ul>
<ol>
   <li>
      number list
      <ol>
         <li>
            indented
         </li>
      </ol>
   </li>
</ol>
<div>
   <br>
   horizontal rule:
   <hr>
   <br>
   Pasted tables get reformatted by Trix I guess:
   <br>
   <br>
   <strong>
   CompanyContactCountry
   </strong>
   <br>
   Alfreds Futterkiste | Maria Anders | Germany
   <br>
   Centro comercial Moctezuma | Francisco Chang | Mexico
   <br>
   Ernst Handel | Roland Mendel | Austria
   <br>
   Island Trading | Helen Bennett | UK
   <br>
   Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada
   <br>
   Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </br>
   </hr>
   </br>
</div>
</p>

I would err on the side of being more permissive (of safe elements) and allow the other tags in case Trix allows pasting of other html, or we use a different editor one day. Is there a comprehensive list we could copy from somewhere? If not, can we add all the above examples?

Copy link
Member

@dacook dacook May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I just realised that the Enterprise description uses a different editor!
So your list here should be enough (though it would be worth checking for br).

Still, if we extend this to products we'd need to expand the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a comprehensive list we could copy from somewhere?

Well, when we omit our list of safe tags then Rails comes with a comprehensive list already. We could use that to allow more pasted HTML. But then we have a discrepancy of pasted vs edited. And when it's about security, I usually err on the side of caution and be more restrictive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the full standard list: https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb

It would allow certain CSS as well. I don't think it's all safe in our case, it could mess up the layout or maybe more.

# We offer an editor which supports certain tags but you can't insert just any
# HTML, which would be dangerous.
class HtmlSanitizer
def self.sanitize(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 👍

Comment on lines +7 to +11
def up
Enterprise.where.not(long_description: [nil, ""]).find_each do |enterprise|
enterprise.update!(long_description: sanitize(long_description))
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is un-reversable of course, so I think it's a bit risky (what if we messed up formatting for lots of enterprises).
Maybe it would be fine, and I don't want to make more effort unnecessarily, but would consider first sanitising the view and wait until it's been released for a couple of weeks before applying the migration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We should at least test it first. The br tag is a good example for something I forgot and where I'm not sure it's covered.

@mkllnk mkllnk marked this pull request as draft May 14, 2024 02:42
@kirstenalarsen
Copy link
Contributor

which project should this be coded against @mkllnk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress ⚙
Development

Successfully merging this pull request may close these issues.

None yet

4 participants