-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad UTF-8 "To" header encoding? #369
Comments
Thanks for the report and the detailed analysis. I'm able to reproduce this and am investigating. [I hope you don't mind, I edited your report to format the input and output columns as code, because GitHub was hiding important characters essential to understanding the problem.] It looks like you've uncovered a bug in Python's email package, related to incorrectly "folding" address headers that are too long, when using Content-Transfer-Encoding (CTE) 7bit, and if the headers need "encoded words" and also contain "special characters." I haven't been able to locate a Python bug report for this exact problem, but a similar issue with shorter address headers that didn't require folding (python/cpython#81663) was fixed in Python 3.8. I'm guessing they missed the folding case. Anymail's SES backend needs to use CTE 7bit (the line of code you identified), because Amazon SES doesn't officially support 8bit CTE, and using it can result in mojibake depending on what other SES options are enabled. (See the comments above that code and Anymail issue #115.) I'll look into workarounds…
|
Thanks for the quick and complete response, and for improving the table
in my post!
I'll adapt the test case so it can be reproduced without Anymail and
report at CPython.
About the workarounds, it seemed hard to fix the issue without monkey
patching, something I would like to avoid since sending email is a
crucial part of my application. So, for now, I was just going to remove
commas and parenthesis from the addresses...
|
Hmm, looking into this some more, I think it's actually a Django bug. And there's a reasonable workaround Anymail could implement. Python's import email.message
import email.policy
import django.core.mail
to = '"Người nhận a very very long, name" <to@example.com>'
policy = email.policy.default.clone(cte_type="7bit")
# Python's EmailMessage class doesn't exhibit bug
msg1 = email.message.EmailMessage()
msg1["To"] = to
print(msg1.as_bytes(policy=policy).decode("ascii"))
# To:
# =?utf-8?b?TmfGsOG7nWkgbmjhuq1uIGEgdmVyeSB2ZXJ5IGxvbmcs?= name <to@example.com>
# Django's EmailMessage class has bug
msg2 = django.core.mail.EmailMessage(to=[to]).message()
msg2.policy = policy
print(msg2.as_bytes().decode("ascii"))
# [... other headers ...]
# To: =?utf-8?b?TmfGsOG7nWkgbmjhuq1u?= a very very long, name <to@example.com> Django's I think Anymail should just stop using Django's Before switching, we'll need to investigate whether there's anything important in Django's SafeMIME classes we'd be losing or need to copy. I suspect a lot of SafeMIME is there solely to work around bugs and security concerns in Python's legacy Message code—issues that don't apply to Python's modern EmailMessage. But I haven't really looked through that part of Django's mail package in a while. |
Last Saturday I couldn't reproduce the bug using only Python's I investigated a bit more now and it really seems a conflict between Django and Python behavior, as you said.
Becomes this:
But later,
If we comment the
What seems valid. Even if I really increase the size of the string, so the non-ASCII and the comma stay in different lines, it still works:
So the problem only happens when both functions try to encode the string. Should I report it both at Python and Django? |
I'd report it to Django only. I'm pretty sure the problem only occurs with Python email's legacy My understanding is the Compat32 legacy layer is there specifically to replicate Python 2's email behavior (including any bugs), so there's not much point in reporting bugs against it. |
Done! Feel free to add more info there. |
Hi!
I'm getting errors when trying to send messages when the "to" header has non-ASCII chars.
The problem seems to happen when all these conditions are true at the same time:
I will use the related test to better explain:
"Người nhận" <to@example.com>
To: =?utf-8?b?TmfGsOG7nWkgbmjhuq1u?= <to@example.com>
"Người nhận a very very long name" <to@example.com>
To: =?utf-8?b?TmfGsOG7nWkgbmjhuq1u?= a very very long name <to@example.com>
"Người nhận a very very long, name" <to@example.com>
To: =?utf-8?b?TmfGsOG7nWkgbmjhuq1u?= a very very long, name <to@example.com>
"Người nhận a very very long, náme" <to@example.com>
To: =?utf-8?b?TmfGsOG7nWkgbmjhuq1uIGEgdmVyeSB2ZXJ5IGxvbmcsIG7DoW1l?=\n <to@example.com>
"Người nhận, name" <to@example.com>
To: =?utf-8?b?TmfGsOG7nWkgbmjhuq1uLCBuYW1l?= <to@example.com>
So, if we have a UTF-8 encoded part and a special char, the special char must also be encoded, but currently this is only happening if the special char is close or between non-ASCII chars.
Commenting this line seems to encode the entire name in these cases, solving the problem. But I don't know if this has other unwanted consequences.
Example traceback:
The text was updated successfully, but these errors were encountered: