Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Ticket Content from Mail #1063

Open
brucegibbins opened this issue Jan 13, 2023 · 8 comments
Open

Missing Ticket Content from Mail #1063

brucegibbins opened this issue Jan 13, 2023 · 8 comments
Labels
bug Something isn't working or is broken

Comments

@brucegibbins
Copy link
Collaborator

When a ticket is raised via email that has multiple parts with the body being html the content is not used for ticket Description or Comment.

I suspect that this may be somehow related to the email being an O365 mailbox.

I have troubleshot the code and can see where the multipart sections are parsed in the extract_part_data() function. In my case the content type is 'text/html'.

This portion of the code populates the part_full_body variable after parsing with beautifulsoup but it does not populate the part_body variable as it does if the message was pure text.

Because of this the calling function object_from_message() does not receive any value for the body variable. The end result is that the following conditions looking for body content all fail.

I am not sure if this is an RFC formatting issue. But I am working on trying to resolve unless someone has any ideas

@brucegibbins brucegibbins added the bug Something isn't working or is broken label Jan 13, 2023
@uhurusurfa
Copy link
Collaborator

The code will need to be upodated to handle the described situation.
If you are able to fix yourself and raise a PR please feel free otherise it will gt fixed in due cxourse as develoipers are able to devote the time to it.

@brucegibbins
Copy link
Collaborator Author

brucegibbins commented Jan 30, 2023 via email

@uhurusurfa
Copy link
Collaborator

Do you have an example email where this is happening?
It sounds like it is an amail that does not have a text/plain part to it and would make it quicker to create a unit test for this if you can procide the sample that is breaking it..

@brucegibbins
Copy link
Collaborator Author

brucegibbins commented Mar 24, 2023

Hi. I actually think now that the small change I made here has led to the other issue I posted regarding the extra line feed after the body was extracted. Let me not waste too much of your time and get quicktests working with my OAUTH mod and then i can relook at this original post. Kind Regards

@brucegibbins
Copy link
Collaborator Author

I am working on a PR for this.

If the mail body is html which in our case, it is because most staff use Outlook.

I have discovered that the after determining that the mail part subtype is 'html' the variable full_part_body is initialised but the variable part_body never carries a value when this condition is true. Then in the calling function object_from_message() the variable body never carries a value and then this is used to populate the description field downstream. Therefore, the description remains empty. If there are multiple mail parts and one of these happens to be text, then the part_body variable does get a value, but it may not be the one expected.

Something needs to happen near line 778 in in email.py unless I am missing something.

@uhurusurfa
Copy link
Collaborator

Looking at the code history, it seems there was an effort to deal with this issue over time and appears to have resulted in a fairly convoluted bit of code.

It is probably worth defining exactly what the extract process should be then refactor the code to do that.

The first commit more than 4 years ago iterated the body parts if it was a multipart looking for a text/plain part and used that as the message body.
The fallback was to search for a body element by converting the entire mime message to a string which does not really make sense from my point of view as the outcome is likely to be hit and miss.

Modern emails generally are sent with a multipart that encapsulates both a text/plain and a text/html part by default unless the email client is configured to do differently. Some disable the text/plain part so the body has to come from the text/html part.

The problem gets complicated when attachments and inline content are added resulting in a multipart within a multipart. And then attached emails can themselves have a similar structure resulting in a multidepth message.

So small wonder you are seeing unreliable processing of emails and really the algorithm needs to be rewritten as a recursive decent parser if the email is to be processed reliably catering for all the variations that may be possible depnding on the email client that assempled the email.

@brucegibbins
Copy link
Collaborator Author

brucegibbins commented Mar 31, 2023 via email

@uhurusurfa
Copy link
Collaborator

uhurusurfa commented Jul 23, 2023

This PR #1104 should fix this issue - please confirm and close if so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working or is broken
Projects
None yet
Development

No branches or pull requests

2 participants