Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC2047 encoded-words as content-transfer-encoding value #122

Open
jameshoulahan opened this issue Nov 30, 2020 · 6 comments
Open

RFC2047 encoded-words as content-transfer-encoding value #122

jameshoulahan opened this issue Nov 30, 2020 · 6 comments

Comments

@jameshoulahan
Copy link
Contributor

There exist messages in the wild (notably, some originating from mailbox.org's automated mailer e.g. due to payments) in which the content-transfer-encoding presents as an RFC2047 encoded-word. For instance, here is a (redacted) example:

To: user@mailbox.org
MIME-Version: 1.0
Content-Type: text/plain;
 charset="utf-8"
Content-Transfer-Encoding: =?utf-8?Q?8bit?=
From: =?utf-8?Q?mailbox.org?= <noreply@mailbox.org>

Lieber Kunde,

... stuff in german ...

go-message cannot handle such messages.

One "hacky" solution is to simply pass the encoded word through the header decoder, as in this patch:

diff --git a/entity.go b/entity.go
index 99e69c8..fe13d91 100644
--- a/entity.go
+++ b/entity.go
@@ -35,8 +35,9 @@ func New(header Header, body io.Reader) (*Entity, error) {
  // e.g. "quoted-printable". So we just ignore it for multipart.
  // See https://github.com/emersion/go-message/issues/48
  if !strings.HasPrefix(mediaType, "multipart/") {
-   enc := header.Get("Content-Transfer-Encoding")
-   if decoded, encErr := encodingReader(enc, body); encErr != nil {
+   if enc, decErr := decodeHeader(header.Get("Content-Transfer-Encoding")); decErr != nil {
+     err = UnknownEncodingError{decErr}
+   } else if decoded, encErr := encodingReader(enc, body); encErr != nil {
      err = UnknownEncodingError{encErr}
    } else {
      body = decoded
@brunnre8
Copy link
Contributor

If I understood the RFC2047/2045 correctly that's invalid for the Content-Transfer-Encoding header... that has a limited set of possible values and must not be encoded

@jameshoulahan
Copy link
Contributor Author

jameshoulahan commented Nov 30, 2020

Yes, it's invalid, but unfortunately, messages in the wild still put encoded words there. I don't think adding a decoding step here is a problem.

I think in this case, making this change increases usability without reducing correctness, as I can't think of a way this would lead to incorrectly parsed messages.

@emersion
Copy link
Owner

Have you reached out to mailbox.org to let them know about the issue?

@jameshoulahan
Copy link
Contributor Author

No. And I haven't been able to reproduce it with newer messages, only ones from last year. Chances are they fixed it already.

@jameshoulahan
Copy link
Contributor Author

@emersion are you opposed to making such a change to your library?

@emersion
Copy link
Owner

emersion commented Dec 7, 2020

In this particular case, users can trivially ignore the UnknownEncodingError, so I think I'd rather not add this quirk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants