Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to keep html entities (such as ©) from being converted to special characters? #347

Open
alekdavisintel opened this issue Apr 20, 2023 · 3 comments

Comments

@alekdavisintel
Copy link

When I process HTML text containing HTML entities, such as ©, PreMailer.Net converts them to the corresponding characters (such as ©). The problem with this particular case is that every HTML email template we use has a footer with the copyright statement. AFAIK, © is legit HTML and should be use as-is. When PreMailer converts © to ©, it freaks out Gmail, so if you open the message with the © character in Gmail, you will see a notice that the message was trimmed, even though it wasn't. If I change © back to © and send the exactly same message, Gmail displays it correctly. So why does PreMailer think that it must convert HTML entities? And is there any way to prevent this behavior? Thanks.

@jasekiw
Copy link
Collaborator

jasekiw commented Sep 15, 2023

Hi @alekdavisintel

This seems to be a limitation with AngleSharp. see AngleSharp/AngleSharp#396.

I turned on the IsNotConsumingCharacterReferences like suggested in the issue but it caused some strange effects. I went ahead and commented on the original AngleSharp thread to see if they can shed any light on the correct direction.

@jasekiw
Copy link
Collaborator

jasekiw commented Sep 15, 2023

@alekdavisintel After more research I found that AngleSharp tokenizes the html into objects that then get outputted using a formatter. I was able to output the copyright symbol as an html entity but this doesn't include other html entities. I can try expanding this approach to handle all html entities.

However, the pitfall and possibly feature is that if the input uses a copyright symbol, premailer will automatically convert it to an html entity. This might be a good feature since email clients have issues with unicode. I'm not sure of any side effects this might cause so it might be best to turn it on via a configuration flag.

@alekdavis
Copy link

alekdavis commented Sep 15, 2023

Great, thanks. I actually implemented a workaround: after pre-mailing the template, I convert the copyright character back to the HTML entity, so it's not a priority for my use case at this time, but I appreciate the update. I would expand it to other HTML characters, at least to the common ones, like ®, ™, etc. I read the AngleSharp response, but I don't quite get the answer. Anyway, thanks a lot for looking into this. I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants