Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix encoding/decoding of base-256 numbers #215

Merged
merged 2 commits into from Jun 1, 2019

Conversation

justfalter
Copy link
Contributor

@justfalter justfalter commented May 30, 2019

This PR fixes issues I've identified with the handling of base-256 encoded numbers within node-tar (see #188). The issues would generally present themselves when attempting to extract a gnu-formatted tar entry for a file greater than 8gb in size.

  • The last byte of the buffer was incorrectly being ignored when encoding/decoding.
  • Javascript can only have safe integer precision for numbers between -9007199254740991 and 9007199254740991. Any numbers outside these bounds will see the lowest-order bits rounded off. For example, 9007199254749999 will be rounded to 9007199254750000.
  • I've modified large-integer.js parse and encode functions to throw TypeError exceptions if they encounter a number that will not be precisely represented as a javascript integer, or if the buffer being decoded does not appear to be a base-256 encoded number (must start with 80 or ff).

- Encoding/decoding of base-256 numbers failed to failed to handle last
byte in buffer. Handling was previously broken.
- Take javascript's MAX_SAFE_INTEGER / MIN_SAFE_INTEGER into account
when encoding/decoding. Namely, if the numbers cannot accurately be
represented in javascript with integer-precision, a TypeError will be
thrown.
- Throw a TypeError if the parser is passed an buffer that does not
appear to be base-256 encoded. (must start with 0x80 or 0xff)
lib/large-numbers.js Outdated Show resolved Hide resolved
@isaacs isaacs merged commit 9a44de7 into isaacs:master Jun 1, 2019
@isaacs
Copy link
Owner

isaacs commented Jun 1, 2019

I see what happened here.

To encode files over 8gb, bsdtar drops the trailing 0x20, and uses that as a part of the octal-in-ascii number. This is an ambiguous part of the spec (such as it is), which gnutar interprets differently. Thankfully, bsdtar also prepends a PAX extended file attributes entry, which gnutar interprets properly. So, for example, a 10gb file would fill those 12 bytes with '120000000000', no terminator. Gnutar misinterprets this as 0o12000000000 (a mere 1.25 Gb), but that's overridden by the PAX header.

I'm not sure what bsdtar would write in that block if the file size couldn't fit in 12 octal digits, but I'm also not eager to create a 64 gb file on my laptop to find out.

@isaacs
Copy link
Owner

isaacs commented Jun 1, 2019

Actually I did get curious and checked. Bsdtar does the same thing as gnutar, but only when the file is 64gb or greater.

In short, this patch is good, it's landed, and published to latest. Thank you for digging in and fixing this. The reason it escaped my notice for so long is frankly that pax headers are so much more straightforward and make this type of bug irrelevant in so many cases.

The checks for Number.MAX_SAFE_INTEGER made me chuckle. I thought for a second that throwing a new error should be semver major bump, but if people are using this library to pack and unpack tarballs with 8 petabyte files in them, then the world is definitely in trouble.

@justfalter
Copy link
Contributor Author

Thanks for taking the time to go over this, @isaacs. I’ve got something upstream from my project that is explicitly producing gnu-formatted tarballs, for some reason.

Honestly, the only reason I even thought to do the min/max int checks were because you had tests that would have violated those checks. 8 petabyte files are pretty unlikely, but I wondered if there weren’t other numbers that might be encoded (uid, his, etc) that might someday exceed max int. I dug a bit into libarchive, and they have similar checks, there, as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants