Fix imap does not decode text body correctly #104217

jbouwh · 2023-11-19T16:54:27Z

Proposed change

Make sure encoded body text parts are decoded to user readable text.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes Issue with IMAP integration, cannot read text (follow up of #86388) #102266
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Black (black --fast homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

joostlek · 2023-11-19T18:33:15Z

homeassistant/components/imap/coordinator.py

+            """
+            try:
+                return str(part.get_payload(decode=True).decode(self._charset))
+            except Exception:  # pylint: disable=broad-except


What kind of exceptions can we expect?

Unicode issues, or specific decode errors. There is quite a lot of things that go wrong if an email message is not correcly formatted. I added one case in a test that shows the default case works.

Usually ValueError will catch those kinds of exceptions.

There's no reason to catch other errors unless we know more.

In my test UnicodeDecodeError was raised, this will raise TypeError.

Documentation says

get_payload(i=None, decode=False) Return the current payload, which will be a list of [Message](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message) objects when [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is True, or a string when [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is False. If the payload is a list and you mutate the list object, you modify the message’s payload in place. With optional argument i, [get_payload()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.get_payload) will return the i-th element of the payload, counting from zero, if [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is True. An [IndexError](https://docs.python.org/3/library/exceptions.html#IndexError) will be raised if i is less than 0 or greater than or equal to the number of items in the payload. If the payload is a string (i.e. [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is False) and i is given, a [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) is raised. Optional decode is a flag indicating whether the payload should be decoded or not, according to the Content-Transfer-Encoding header. When True and the message is not a multipart, the payload will be decoded if this header’s value is quoted-printable or base64. If some other encoding is used, or Content-Transfer-Encoding header is missing, the payload is returned as-is (undecoded). In all cases the returned value is binary data. If the message is a multipart and the decode flag is True, then None is returned. If the payload is base64 and it was not perfectly formed (missing padding, characters outside the base64 alphabet), then an appropriate defect will be added to the message’s defect property (InvalidBase64PaddingDefect or InvalidBase64CharactersDefect, respectively). When decode is False (the default) the body is returned as a string without decoding the Content-Transfer-Encoding. However, for a Content-Transfer-Encoding of 8bit, an attempt is made to decode the original bytes using the charset specified by the Content-Type header, using the replace error handler. If no charset is specified, or if the charset given is not recognized by the email package, the body is decoded using the default ASCII charset. This is a legacy method. On the EmailMessage class its functionality is replaced by [get_content()](https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.get_content) and [iter_parts()](https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.iter_parts).

In this case it does not matter what error will be raised, in those cases we will return the undecoded payload.
IMO it does not make sense to sum up all possible error types here.

@MartinHjelmare If then we should better handle this I'd suggest using (ValueError, TypeError) in this case.
I'll open a PR to deal with that.

See: #104227

Fix imap does not decode text body correctly

Verified

This commit was signed with the committer’s verified signature.

bdraco J. Nick Koston

SSH Key Fingerprint: yg7JSgeeILNaBQZstddaVlTaDn2euAb9Ngq4jVYLiEg
Verified
Learn about vigilant mode

677839f

home-assistant bot added bugfix cla-signed has-tests integration: imap small-pr by-code-owner Quality Scale: No score labels Nov 19, 2023

jbouwh added this to the 2023.11.3 milestone Nov 19, 2023

joostlek reviewed Nov 19, 2023

View reviewed changes

joostlek approved these changes Nov 19, 2023

View reviewed changes

jbouwh merged commit 9a38e23 into dev Nov 19, 2023

jbouwh deleted the jbouwh-imap-fix-body-text-decoding branch November 19, 2023 19:15

jbouwh mentioned this pull request Nov 19, 2023

Use more specific exception type for imap decoding #104227

Merged

20 tasks

github-actions bot locked and limited conversation to collaborators Nov 21, 2023

frenck added the cherry-picked label Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix imap does not decode text body correctly #104217

Fix imap does not decode text body correctly #104217

jbouwh commented Nov 19, 2023

joostlek Nov 19, 2023

jbouwh Nov 19, 2023

MartinHjelmare Nov 19, 2023 •

edited

Loading

jbouwh Nov 19, 2023

jbouwh Nov 19, 2023

jbouwh Nov 19, 2023

jbouwh Nov 19, 2023

Fix imap does not decode text body correctly #104217

Fix imap does not decode text body correctly #104217

Conversation

jbouwh commented Nov 19, 2023

Proposed change

Type of change

Additional information

Checklist

joostlek Nov 19, 2023

Choose a reason for hiding this comment

jbouwh Nov 19, 2023

Choose a reason for hiding this comment

MartinHjelmare Nov 19, 2023 • edited Loading

Choose a reason for hiding this comment

jbouwh Nov 19, 2023

Choose a reason for hiding this comment

jbouwh Nov 19, 2023

Choose a reason for hiding this comment

jbouwh Nov 19, 2023

Choose a reason for hiding this comment

jbouwh Nov 19, 2023

Choose a reason for hiding this comment

MartinHjelmare Nov 19, 2023 •

edited

Loading