Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MoveCssInline encodes non-ASCII characters even when they should be valid HTML #193

Open
CaptainStack opened this issue Feb 21, 2020 · 0 comments

Comments

@CaptainStack
Copy link

CaptainStack commented Feb 21, 2020

I am seeing multiple variants of this issue and it is often treated as a closed or non-issue, but it is currently completely blocking my work and I need a fix or a workaround. I am using PreMailer.Net version 2.0.1.0 on Windows 10 in a C#/ASP.NET project.

Like in this issue, MoveCssInline is changing characters like '&' my URLs. For example:

<a href="http://www.website.com/page?param1=a&param2=b"></a>

Is changed to:

<a href="http://www.website.com/page?param1=a&amp;param2=b"></a>

Most of the URLs I work with contain ampersands because we use form codes and several other query parameters. I need to inline CSS into the HTML, but I do not have control over the URLs in the document and I am not allowed to change them.

One of the responses on the issue I linked earlier points out that properly encoded strings on attributes is a part of the HTML specification and that therefore the output is correct.

But PreMailer.Net is not an HTML validation or sanitation utility - it is a CSS inliner and should not have other side effects on the document if possible.

Additionally, I have tested further and found that this encoding is not just done on attributes like href. It in fact will also encode text/InnerHTML values, which are absolutely valid html without encoding. Example:

<p>&</p>

This is valid HTML and should not be encoded, but PreMailer.Net will change this to:

<p>&amp;</p>

I am desperate for a fix or workaround, please help. I have also looked at the following issues for help:

  • Issue #70 - This suggests the issue is "rooted in lack of explicitly setting correct encoding when calling CsQuery." Though the issue is closed, it does not actually explain how you can set that value.

  • Issue #193 for the Ruby implementation of Premailer - As far as I can tell the Ruby implementation of Premailer has also caused similar issues. This issue appears to contain a solution that worked for a number of people in which they use the following Doctype: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">. I have tested this on PreMailer.Net and it does not appear to solve the problem.

Update

After a bit more digging, I found this issue which suggests it is caused by a PreMailer.Net dependency called AngleSharp, which parses the HTML document. When it re-outputs the HTML it runs a function called EscapeText which escapes these characters. According to this issue, this is by-design as it is in line with the HTML spec.

However, I think this is still an issue for PreMailer.Net and even AngleSharp, which should not be making these changes to input HTML unless requested/specified by the caller.

Update 2

I have been working with the AngleSharp folks and believe I will be able to send a PR with an option to suppress this encoding behavior soon by passing a custom formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant