Skip to content
This repository has been archived by the owner on Apr 9, 2023. It is now read-only.

tar containing pdf is detected as pdf #17

Open
phiresky opened this issue Jun 16, 2019 · 2 comments
Open

tar containing pdf is detected as pdf #17

phiresky opened this issue Jun 16, 2019 · 2 comments

Comments

@phiresky
Copy link

I know this library is unmaintained, but opening this for a future maintainer :)

this file: test.tar.zip (zipped to prevent github complaining)

file --mime-type

application/x-tar

tmagic:

application/pdf

@hongquan
Copy link

@phiresky This is intentional. Quote from README:

Unlike the typical approach that libmagic and file(1) uses, this loads all the file types in a tree based on subclasses. (EX: application/vnd.openxmlformats-officedocument.wordprocessingml.document (MS Office 2007) subclasses application/zip which subclasses application/octet-stream) Then, instead of checking the file against every file type, it can traverse down the tree and only check the file types that make sense to check. (After all, the fastest check is the check that never gets run.)

@phiresky
Copy link
Author

phiresky commented Aug 23, 2021

tbh I don't see how that explains misdetection? Why does traversing a tree explain a wrong detection? Even if the answer is ambiguous, I don't see why it can't either output the more likely one or all possibilities.

The readme also says tree_magic is designed to be more efficient and to have less false positives compared to the old approach used by libmagic

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants