-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partition_msg
is unable to process attachments
#3006
Comments
@MthwRobinson This appears to be either a corrupted message or a defect or limitation in DiagnosticsI'm able to reproduce this error. I was able to detach the attachment using On attempt to open it with PowerPoint it signals a "repair error":
When clicking "Repair" it states:
On inspection, the attachment binary appears to be a zip archive (first two bytes of file are "PK"). However it cannot be unzipped and fails with this message: $ unzip Engineering\ Onboarding.pptx
Archive: Engineering Onboarding.pptx
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of Engineering Onboarding.pptx or
Engineering Onboarding.pptx.zip, and cannot find Engineering Onboarding.pptx.ZIP, period. |
Sounds good, I'll update this issue to reflect removing that document from |
fake-email-multiple-attachments.msg
from example-docs
fake-email-multiple-attachments.msg
from example-docs
partition_msg
is unable to process attachments
This turned out to be a defect in |
**Summary** `partition_msg()` previously used the `msg_parser` library for parsing Outlook MSG email files (.msg files). The `msg_parser` library is unmaintained and has several major shortcomings such as not being able to parse MSG files with 8-bit encoded strings and not reliably extracting attachments. Use the new and permissively licenced `python-oxmsg` library instead. **Additional Context** For reviewability purposes, this PR temporarily places the new `partition_msg()` implementation in `new_msg.py` and references that implementation from `msg.py`. `new_msg.py` will be renamed to `msg.py` in a closely following PR. This avoids a very messy interleaving of hunks in a diff between the old and re-written `partition_msg()` implementation. Fixes #2481 Fixes #3006
To reproduce
The error tha arise is:
ValueError: Invalid file /tmp/tmpo99fe8l4/Engineering Onboarding.pptx. The FileType.ZIP file type is not supported in partition.
The text was updated successfully, but these errors were encountered: