Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRC-32 checks #49

Open
xeduardox opened this issue Dec 19, 2016 · 5 comments
Open

CRC-32 checks #49

xeduardox opened this issue Dec 19, 2016 · 5 comments
Labels

Comments

@xeduardox
Copy link

Readme says CRC32 check not performed. Is that an option or do you not plan to support it?

@thejoshwolfe
Copy link
Owner

I do not currently plan to do any CRC32 checks in yauzl. I know that is a feature of other zip file readers, and I'm not sure how valuable it would be for yauzl to support it.

In my opinion, hash-based error checking should really never be done inside a file format. A file format does not correspond to any situation where data corruption could occur. Rather error checking should happen when errors can happen, such as during network transmission. If you're trying to check for errors in your storage hardware degrading over time, then you can keep checksums of files beside the files and do your error checking whenever you like regardless of the file format. Additionally, if error checking is outside the file format, then you're not limited to a fixed set of hash algorithms, like CRC32.

That being said, it's very popular to include redundancy information in file formats. Even the brand new FLIF image format includes optional CRC32 checksums, despite everything else about the format seeming very progressive. Perhaps checksums in files are not as useless as I think. I'm not sure.

If anyone makes an argument to convince me that it's valuable to do the CRC32 checks in yauzl, I will gladly add optional support for it. Keep in mind that CRC32 computation is not free, so it will slow down unzipping slightly. Currently, yauzl can unzip faster than Info-ZIP's unzip command line program, which is written in C, for some zip files probably because yauzl is skipping the CRC32 checking.

If yauzl added support for CRC32 checking, then it would be an error emitted from the read stream obtained from openReadStream() after the file contents have been piped through before ending the pipeline.

@overlookmotel
Copy link
Contributor

I would like to (belatedly) speak up in favour of CRC32 checking.

I agree with you in principle that hash-based error checking is better done outside of a file format. And there are more appropriate hashing algorithms than CRC32 for many uses.

However, in my use case, I do not control the source of ZIP files I work with, nor the transmission medium by which I receive them. The best indication I have of whether files are corrupted is the CRC32 values in the ZIP file.

I imagine this is not an uncommon situation (I notice some other issues on this repo where @thejoshwolfe has asked how a problematic ZIP file was created, and the answer was "no idea, someone sent it to me").

Would you be willing to reopen this issue?

@thejoshwolfe
Copy link
Owner

I believe I can simply add documentation to the README explaining how to do the CRC32 checks outside of yauzl. I think it's as simple as piping the readStream from openReadStream through a CRC32 checker, and comparing it to entry.crc32.

I'm reopening the issue to look into it.

@thejoshwolfe thejoshwolfe reopened this Apr 3, 2018
@overlookmotel
Copy link
Contributor

I have just published a module on npm yauzl-crc that adds CRC32 checking.

@thejoshwolfe If you have time, would you mind taking a look to see if I've missed anything? Streams are not my strong suit.

@sirisian
Copy link

sirisian commented Jan 7, 2022

For what it's worth, I am using this library to unzip archives and saw on some hardware it corrupts a file with null bytes. This is somewhat rare and is probably hardware specific. (Might be overheating, but that's just a guess). Decided to dive into how this is possible and saw this issue. On large systems the probability is near zero, but because of scale it happens. Creates some very subtle bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants