Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline vs Attachment part logic is insufficient for emails in the wild #138

Open
vkryukov opened this issue Aug 16, 2021 · 5 comments
Open

Comments

@vkryukov
Copy link

vkryukov commented Aug 16, 2021

Hello,

with the current implementation, interface PartHeader is implemented as either InlineHeader or AttachmentHeader that both embed message.Header. And although it's not documented by the package, the wiki example implies that PartHeader will always be one of the two.

The logic to decide whether its AttachmentHeader or PartHeader,

if disp == "inline" || (disp != "attachment" && strings.HasPrefix(t, "text/")) {
    mp.Header = &InlineHeader{p.Header}
} else {
    mp.Header = &AttachmentHeader{p.Header}
}

unfortunately, doesn't always work for emails in the wild. E.g., I have a message with the following part header:

Content-Type: text/plain; name="emailreceipt_20121015R2315576090.pdf"
Content-Disposition: inline; filename=emailreceipt_20121015R2315576090.pdf

%PDF-1.4..%...

It's clearly a PDF attachment, but is currently classified as InlineHeader, so we cannot use AttachmentHeader.Filename to extract the filename.

To solve this problem, the user code can of course cast PartHeader as Header (which will succeed with the current implementation), and then effectively re-implement AttachmentHeader.Filename logic by perusing Header.ContentDisposition and Header.ContentType methods.

The alternative would be, at a minimum, to extend PartHeader interface to directly expose ContentDisposition and ContentType, which will help the user code that works with malformed parts. (If this indeed a desired direction, I'm happy to create a pull request).

Another alternative would be to improve the Inline vs Attachment logic, but it will probably be brittle as you cannot enumerate badness in the wild.

Thoughts?

@emersion
Copy link
Owner

Hm, this seems to be allowed by the RFC and indicate that the PDF attachment should be displayed with the text of the message.

The RFCs allow very surprising things, such as multiple inline parts mixed together with a different MIME type each.

@vkryukov
Copy link
Author

vkryukov commented Aug 19, 2021

Thanks for your quick response!

I think this particular set of MIME headers is a violation of the standard, as e.g. RFC 2045 section 5 (Content-Type Header field) prescribes that

For this reason, registered subtypes of text, image, audio, and video should not contain embedded
information that is really of a different type.

and RFC 2046 Section 3 says

text -- textual information. The subtype "plain" in
particular indicates plain text containing no
formatting commands or directives of any sort. Plain
text is intended to be displayed "as-is". No special
software is required to get the full meaning of the
text, aside from support for the indicated character
set.

Of course, we will always see a badly formatted email in the wild, and no library can expect to handle all of these cases - "you can't enumerate badness". I'm wondering what is the best way for a library like go-message to support users who have to deal with all of this weirdness.

Here is one example: we can extend PartHeader to include ContentType() and ContentDisposition() methods, as they can give additional useful information about the nature of the part if Content-Type in the header is lying like in the example above. This is not necessary, of course - in the current implementation, the following works perfectly fine:

type header interface {
	ContentType() (t string, params map[string]string, err error)
	ContentDisposition() (t string, params map[string]string, err error)
}
h, ok := p.Header.(header)
if !ok {
	// Impossible with the current implementation
}

but it would make writing the user code easier. Another good candidate would be Header.Fields() to enumerate all the fields in the part header.

If you feel that this is unnecessary, please feel free to close this issue.

@emersion
Copy link
Owner

emersion commented Sep 8, 2021

Oh. text/plain with a PDF file, and all in a inline part. Completely missed that, I thought the MIME type was correct.

FWIW, it's also possible for have non-textual parts with an inline disposition. For instance Apple Mail will do this to embed images in a plain-text message: the message will contain interleaved text/plain and image/* parts, all of which are inline and wrapped in a multipart/mixed part.

Regarding your suggestion, maybe we can just add these to the existing PartHeader interface? I wonder how far we should go, there are a few more methods implemented by both types of headers.

@emersion
Copy link
Owner

emersion commented Sep 8, 2021

Ah, this is exactly what you suggest. Yes, this sounds reasonable.

@vkryukov
Copy link
Author

vkryukov commented Sep 9, 2021

Cool - why don't I work on a pull request then? I will add ContentType and ContentDisposition for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants