Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix imap does not decode text body correctly #104217

Merged
merged 1 commit into from
Nov 19, 2023

Conversation

jbouwh
Copy link
Contributor

@jbouwh jbouwh commented Nov 19, 2023

Proposed change

Make sure encoded body text parts are decoded to user readable text.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.
bdraco J. Nick Koston
"""
try:
return str(part.get_payload(decode=True).decode(self._charset))
except Exception: # pylint: disable=broad-except
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of exceptions can we expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode issues, or specific decode errors. There is quite a lot of things that go wrong if an email message is not correcly formatted. I added one case in a test that shows the default case works.

Copy link
Member

@MartinHjelmare MartinHjelmare Nov 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually ValueError will catch those kinds of exceptions.

There's no reason to catch other errors unless we know more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test UnicodeDecodeError was raised, this will raise TypeError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation says

get_payload(i=None, decode=False)

    Return the current payload, which will be a list of [Message](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message) objects when [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is True, or a string when [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is False. If the payload is a list and you mutate the list object, you modify the message’s payload in place.

    With optional argument i, [get_payload()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.get_payload) will return the i-th element of the payload, counting from zero, if [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is True. An [IndexError](https://docs.python.org/3/library/exceptions.html#IndexError) will be raised if i is less than 0 or greater than or equal to the number of items in the payload. If the payload is a string (i.e. [is_multipart()](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.is_multipart) is False) and i is given, a [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) is raised.

    Optional decode is a flag indicating whether the payload should be decoded or not, according to the Content-Transfer-Encoding header. When True and the message is not a multipart, the payload will be decoded if this header’s value is quoted-printable or base64. If some other encoding is used, or Content-Transfer-Encoding header is missing, the payload is returned as-is (undecoded). In all cases the returned value is binary data. If the message is a multipart and the decode flag is True, then None is returned. If the payload is base64 and it was not perfectly formed (missing padding, characters outside the base64 alphabet), then an appropriate defect will be added to the message’s defect property (InvalidBase64PaddingDefect or InvalidBase64CharactersDefect, respectively).

    When decode is False (the default) the body is returned as a string without decoding the Content-Transfer-Encoding. However, for a Content-Transfer-Encoding of 8bit, an attempt is made to decode the original bytes using the charset specified by the Content-Type header, using the replace error handler. If no charset is specified, or if the charset given is not recognized by the email package, the body is decoded using the default ASCII charset.

    This is a legacy method. On the EmailMessage class its functionality is replaced by [get_content()](https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.get_content) and [iter_parts()](https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.iter_parts).

In this case it does not matter what error will be raised, in those cases we will return the undecoded payload.
IMO it does not make sense to sum up all possible error types here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinHjelmare If then we should better handle this I'd suggest using (ValueError, TypeError) in this case.
I'll open a PR to deal with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #104227

@jbouwh jbouwh merged commit 9a38e23 into dev Nov 19, 2023
@jbouwh jbouwh deleted the jbouwh-imap-fix-body-text-decoding branch November 19, 2023 19:15
@github-actions github-actions bot locked and limited conversation to collaborators Nov 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with IMAP integration, cannot read text (follow up of #86388)
4 participants