-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix imap does not decode text body correctly #104217
Conversation
""" | ||
try: | ||
return str(part.get_payload(decode=True).decode(self._charset)) | ||
except Exception: # pylint: disable=broad-except |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of exceptions can we expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unicode issues, or specific decode errors. There is quite a lot of things that go wrong if an email message is not correcly formatted. I added one case in a test that shows the default case works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually ValueError
will catch those kinds of exceptions.
There's no reason to catch other errors unless we know more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test UnicodeDecodeError
was raised, this will raise TypeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation says
get_payload(i=None, decode=False)
Return the current payload, which will be a list of [Message](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message) objects when [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is True, or a string when [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is False. If the payload is a list and you mutate the list object, you modify the message’s payload in place.
With optional argument i, [get_payload()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.get_payload) will return the i-th element of the payload, counting from zero, if [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is True. An [IndexError](https://docs.python.org/3/library/exceptions.html#IndexError) will be raised if i is less than 0 or greater than or equal to the number of items in the payload. If the payload is a string (i.e. [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is False) and i is given, a [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) is raised.
Optional decode is a flag indicating whether the payload should be decoded or not, according to the Content-Transfer-Encoding header. When True and the message is not a multipart, the payload will be decoded if this header’s value is quoted-printable or base64. If some other encoding is used, or Content-Transfer-Encoding header is missing, the payload is returned as-is (undecoded). In all cases the returned value is binary data. If the message is a multipart and the decode flag is True, then None is returned. If the payload is base64 and it was not perfectly formed (missing padding, characters outside the base64 alphabet), then an appropriate defect will be added to the message’s defect property (InvalidBase64PaddingDefect or InvalidBase64CharactersDefect, respectively).
When decode is False (the default) the body is returned as a string without decoding the Content-Transfer-Encoding. However, for a Content-Transfer-Encoding of 8bit, an attempt is made to decode the original bytes using the charset specified by the Content-Type header, using the replace error handler. If no charset is specified, or if the charset given is not recognized by the email package, the body is decoded using the default ASCII charset.
This is a legacy method. On the EmailMessage class its functionality is replaced by [get_content()](https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.get_content) and [iter_parts()](https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.iter_parts).
In this case it does not matter what error will be raised, in those cases we will return the undecoded payload.
IMO it does not make sense to sum up all possible error types here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MartinHjelmare If then we should better handle this I'd suggest using (ValueError, TypeError)
in this case.
I'll open a PR to deal with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: #104227
Proposed change
Make sure encoded body text parts are decoded to user readable text.
Type of change
Additional information
Checklist
black --fast homeassistant tests
)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest
.requirements_all.txt
.Updated by running
python3 -m script.gen_requirements_all
..coveragerc
.To help with the load of incoming pull requests: