Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docx detected as Zip due to trash files #171

Open
lucasgadams opened this issue Feb 13, 2024 · 1 comment
Open

Docx detected as Zip due to trash files #171

lucasgadams opened this issue Feb 13, 2024 · 1 comment

Comments

@lucasgadams
Copy link

A few specific files that are proper docx type are being detected as zip. I looked into it and the current code checks for a matching mime type identifier in the beginning of the buffer, checking the first document in the zipped file. However as recently pointed out in the magic library (here), it is possible and valid to have trash documents/bytes anywhere in the zipped file, including the first document. The fix as noted in that link is that you need to skip over these trash bytes. Could we get that fix ported to this library?

@lucasgadams
Copy link
Author

This was specifically fixed in the linux file command last year in this commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant