Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated 7-Zip file body (error code: -30) on archive_read_data with 7z archives containing larger (>= 32 MiB) files when skipping entries #2106

Open
mxmlnkn opened this issue Apr 1, 2024 · 2 comments

Comments

@mxmlnkn
Copy link

mxmlnkn commented Apr 1, 2024

Hello,

Thanks for this widely useful project! I'm trying to incorporate it into ratarmount via python-libarchive-c.

After making the new backend work successfully with smaller archives, I stumbled upon a weird problem with a larger test file.

Create test files:

# Large archive with two files to test seekability and independence of opened files, which reproduces the bug.
> spaces-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%1024s' $'\n' >> spaces-32-MiB.txt; done
> zeros-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%01023d\n' 0 >> zeros-32-MiB.txt; done
7z a two-large-files.7z spaces-32-MiB.txt zeros-32-MiB.txt

# Slightly smaller file that I accidentally created before because of a bug, which for some reason works fine!
> spaces-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%1023s' $'\n' >> spaces-32-MiB.txt; done
> zeros-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%01023d\n' 0 >> zeros-32-MiB.txt; done
7z a two-slightly-less-large-files.7z spaces-32-MiB.txt zeros-32-MiB.txt

(The slightly smaller version calls printf '%1023s' $'\n' instead of printf '%1024s' $'\n')

Python code triggering the issue
import libarchive

def listFiles(path):
    print("\nList all entries of:", filePath)
    with libarchive.file_reader(path) as archive:
        for entry in archive:
            print(entry)
            for block in entry.get_blocks():
                assert len(block) > 0

def readNthEntry(path, entryIndex):
    print(f"\nGet contents of file {entryIndex} of archive: {path}")
    with libarchive.file_reader(path) as archive:
        entryCount = 0
        for entry in archive:
            if entryCount == entryIndex:
                print(entry)
                readSize = 0
                for block in entry.get_blocks():
                    readSize += len(block)
                print(f"  Read file contents: {readSize} B")
            entryCount += 1

filePath = "two-large-files.7z"
filePath2 = "two-slightly-less-large-files.7z"

listFiles(filePath)  # No error
readNthEntry(filePath2, 1)  # No error
# libarchive.exception.ArchiveError: Truncated 7-Zip file body (errno=84, retcode=-30, archive_p=...)
readNthEntry(filePath, 1)
C++ code triggering the issue
#include <array>
#include <iostream>
#include <set>
#include <sstream>
#include <stdexcept>
#include <string>
#include <utility>

#include <archive.h>
#include <archive_entry.h>


class Libarchive
{
public:
    Libarchive( const std::string& path )
    {
        archive_read_support_filter_all( m_archive );
        archive_read_support_format_all( m_archive );

        auto returnCode = archive_read_open_filename( m_archive, path.c_str(), 10240 );
        if ( returnCode != ARCHIVE_OK ) {
            std::stringstream message;
            message << "[Libarchive] Open " << path << " failed with: " << archive_error_string( m_archive )
                    << " (error code: " << std::to_string( returnCode ) << ")";
            throw std::runtime_error( std::move( message ).str() );
        }
    }

    ~Libarchive()
    {
        const auto returnCode = archive_read_free( m_archive );
        if ( returnCode != ARCHIVE_OK ) {
            std::cerr << "Freeing archive failed with: " << returnCode << "\n";
        }
    }

    [[nodiscard]] archive*
    pointer() const noexcept
    {
        return m_archive;
    }

private:
    archive* const m_archive{ archive_read_new() };
};


class LibarchiveEntry
{
public:
    ~LibarchiveEntry()
    {
        archive_entry_free( m_entry );
    }

    [[nodiscard]] archive_entry*
    pointer() const noexcept
    {
        return m_entry;
    }

private:
    archive_entry* const m_entry{ archive_entry_new() };
};


void
listFiles( const std::string& path )
{
    Libarchive archive{ path };

    archive_entry* entry{ nullptr };
    while ( archive_read_next_header( archive.pointer(), &entry ) == ARCHIVE_OK ) {
        std::cout << archive_entry_pathname( entry ) << "\n";
        //archive_read_data_skip(a);  // not necessary as the Wiki says
    }
}


void
readNthEntries( const std::string&      path,
                const std::set<size_t>& entryIndexes )
{
    std::cout << "\nGet contents of files";
    for ( const auto i : entryIndexes ) {
        std::cout << " " << i;
    }
    std::cout << " in archive: " << path << "\n";

    Libarchive archive{ path };

    size_t entryCount{ 0 };
    LibarchiveEntry entry;
    while ( true ) {
        /* I also tried with archive_read_next_header, but the bug persists. */
        if ( archive_read_next_header2( archive.pointer(), entry.pointer() ) != ARCHIVE_OK ) {
            break;
        }

        if ( entryIndexes.contains( entryCount ) ) {
            std::cout << archive_entry_pathname( entry.pointer() ) << "\n";

            std::array<char, 32 * 1024> buffer{};
            size_t readSize{ 0 };
            while ( true ) {
                const auto readSizePerCall = archive_read_data( archive.pointer(), buffer.data(), buffer.size() );
                if ( readSizePerCall < 0 ) {
                    std::stringstream message;
                    message << "[Libarchive] Read data failed with: " << archive_error_string( archive.pointer() )
                            << " (error code: " << std::to_string( readSizePerCall ) << ")";
                    //continue;  // Works fine (amount of returned data is correct) to simply ignore the error!?
                    throw std::runtime_error( std::move( message ).str() );
                }
                if ( readSizePerCall == 0 ) {
                    break;
                }
                readSize += readSizePerCall;
            }
            std::cout << "  Read file contents: " << readSize << " B\n";
        } else {
            //archive_read_data_skip( archive.pointer() );  // Uncommenting this does not help.
        }
        ++entryCount;
    }
}


int main()
{
    static const std::string filePath = "two-large-files.7z";
    static const std::string filePath2 = "two-slightly-less-large-files.7z";

    std::cout << "\nList all entries of: " << filePath << "\n";
    listFiles( filePath );

    /* Works fine with the slightly smaller file. */
    readNthEntries( filePath2, { 0 } );
    readNthEntries( filePath2, { 1 } );

    /* Works fine when not skipping any entry. */
    readNthEntries( filePath2, { 0, 1 } );
    readNthEntries( filePath, { 0, 1 } );

    readNthEntries( filePath, { 0 } );
    /* Read data failed with: Truncated 7-Zip file body (error code: -30) */
    readNthEntries( filePath, { 1 } );

    return 0;
}

Compiled with:

g++ -Wall -Wextra -Wshadow -std=c++20 -o libarchive-entry-skipping-issue{,.cpp} -larchive && ./libarchive-entry-skipping-issue

Output:

List all entries of: two-large-files.7z
spaces-32-MiB.txt
zeros-32-MiB.txt

Get contents of files 0 in archive: two-slightly-less-large-files.7z
spaces-32-MiB.txt
  Read file contents: 33521664 B

Get contents of files 1 in archive: two-slightly-less-large-files.7z
zeros-32-MiB.txt
  Read file contents: 33554432 B

Get contents of files 0 1 in archive: two-slightly-less-large-files.7z
spaces-32-MiB.txt
  Read file contents: 33521664 B
zeros-32-MiB.txt
  Read file contents: 33554432 B

Get contents of files 0 1 in archive: two-large-files.7z
spaces-32-MiB.txt
  Read file contents: 33554432 B
zeros-32-MiB.txt
  Read file contents: 33554432 B

Get contents of files 0 in archive: two-large-files.7z
spaces-32-MiB.txt
  Read file contents: 33554432 B

Get contents of files 1 in archive: two-large-files.7z
zeros-32-MiB.txt
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Libarchive] Read data failed with: Truncated 7-Zip file body (error code: -30)
Aborted

Observations:

  • Note that I was very close to reporting this at python-libarchive-c instead of here because I was unable to reproduce the bug with the C++ code at first. It turns out that I forgot the return code check of archive_read_data and it also turns out that ignoring that error (see commented-out code) seems to result in the correct amount of data being returned in subsequent archive_read_data calls!
  • I had a slightly smaller file at first because of printf peculiarities. Everything works fine with that file two-slightly-less-large-files.7z. It only happens with two-large-files.7z.
  • It also does not happen when not skipping entries, i.e., when calling archive_read_data for all entries.
@kientzle
Copy link
Contributor

kientzle commented Apr 1, 2024

Do you see the same issue with this?

bsdtar -tvf two-large-files.7z

Note: The -t option to bsdtar skips the entry bodies to produce its listing.

@mxmlnkn
Copy link
Author

mxmlnkn commented Apr 1, 2024

@kientzle So, it works the same as my listFiles implementations, i.e., archive_read_data is not even called and therefore this bug should not happen. I tried it, and it works without error, same as my implementations. It only happens when skipping the first and then trying to read the second entry.

bsdtar -tvf two-large-files.7z
# -rwx------  0 0      0    33554432 Apr  1 13:16 spaces-32-MiB.txt
# -rwx------  0 0      0    33554432 Mar 31 23:27 zeros-32-MiB.txt

I can reproduce the bug with bsdtar like this:

bsdtar -x --exclude spaces-32-MiB.txt -f two-large-files.7z
# zeros-32-MiB.txt: Truncated 7-Zip file body: File exists
# bsdtar: Error exit delayed from previous errors.

While it works fine when excluding the other file:

bsdtar -x --exclude zeros-32-MiB.txt -f two-large-files.7z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants