New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tar archive containing sparse file shows wrong file size, silently extracts corrupted file #2125
Comments
Your third example here is using the Old GNU Tar Sparse Format. Noone has yet implemented support for that in libarchive. Currently, libarchive supports the GNU Tar Pax Sparse formats, including versions 0.0 and 0.1, and version 1.0. The Old GNU Tar Sparse Format uses an unusual extended header that libarchive currently interprets as part of the file contents. So the extracted file that you're seeing contains a map of the file holes followed by the file data. If you look at a hex dump of the resulting file, you'll see this. ( Implementing support for the old GNU sparse format in libarchive shouldn't be too difficult, since the infrastructure for handling GNU conventions and handling general sparse files already exists. (If you'd like to work on this, I would recommend waiting a couple of weeks until I land my current overhaul of the tar header parsing code. Though I might just take a quick crack at it myself while I'm in this part of the code. ;-) |
Note: The tar header reading overhaul is currently #2127 |
I've started looking into this and it seems we do have code to support the old GNU sparse format, but apparently it's not working correctly for this specific example. |
Here's the real oddity: Your third file is using a GNU sparse file extension, but isn't marked as using GNU format. Rather, it's marked as being a standard "ustar" format file. That seems to be the core problem -- libarchive only expects this particular GNU sparse file format when reading GNU tar files. Do you happen to know what program created this file? |
Ah. This seems to be a "star" format archive, which uses an "S" header that is just enough different from GNU's "S" header that libarchive's logic for GNU "S" headers won't work with it. Probably your other archives are actually GNU tar archives. I was rather confused at first because the documentation I found for star format shows a "tar" signature at the end of the header which your example does not have. It looks like star dropped that signature at some point in favor of a slightly more complex check for it's special header format. |
Well done for figuring that out! I do remember experimenting/playing with star, which has various options to specify the archive type/variant. It looks like I posted this in 2011: https://lists.gnu.org/archive/html/bug-tar/2011-02/msg00010.html |
I have come across an issue relating to extracting a tar archive which contains a sparse file. On extracting, no error is reported but the extracted file is corrupted. Perhaps all the holes are "collapsed" in the extracted file?
Testing on Windows x64.
bsdtar shows the wrong file size for the archive it can not extract correctly:
But it shows the correct size for an archive it can extract correctly:
With GNU tar the file lists (and extracts) correctly:
Circa 2013 I created some tar archives which contain a sparse file. If I recall correctly, I used either star or GNU tar on Linux.
There are three files. Two extract correctly but the third does not. The file in the third archive contains a large number of holes. Each archive contains a single 4,000,000,000-byte sparse file.
You can download the files for testing from
https://www.mediafire.com/file/5frl8btr182au3q/NetBSD_4GB_HD_template.tar/file
(10KB)
https://www.mediafire.com/file/981eoc791c8l91a/NetBSD_4GB_HD_base.tar/file
(30KB)
https://www.mediafire.com/file/zvskd27ylq7xjaa/NetBSD_4GB_HD_after_install_A3000_zeroed_ADOS_partition.tar.xz/file
(83.89MB)
The text was updated successfully, but these errors were encountered: