Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing an empty file to an archive produces unexpected or wrong results #398

Open
Spavid04 opened this issue Dec 24, 2021 · 5 comments
Open
Labels
bug Something isn't working for archiving Issue on archiving, compression or encryption help wanted Extra attention is needed

Comments

@Spavid04
Copy link

Spavid04 commented Dec 24, 2021

Describe the bug
If one tries to write an empty file (more specifically, a 0 byte file) to a SevenZipFile, different write methods produce different (and sometimes wrong) results. To be specific, when writing an empty file:

  • if using write(path_to_file, ...), the file is properly added to the archive, and the archive is a valid 7z file; This is the expected behaviour.
  • if using writestr(b"", ...), the file is added as an empty directory, and all subsequent files have shifted file infos and data (as if the 0-byte file entry is missing, but the data is not; see below for a visual explanation); The archive is a valid 7z file, but it contains bad data.
  • if using writef(stream, ...), an exception is raised (see below for stack trace)

Related issue
#343 also tries to add an empty file that produces unexpected results, and is possibly related.

To Reproduce
Steps to reproduce the behavior:

  1. Have a directory with some files, including at least one empty file.
  2. Uncomment a marked area and run the following code with python3.
import os, py7zr

f = open(r".\\out.7z", "wb")
f7z = py7zr.SevenZipFile(f, "w")

root_dir = ".\\test data"
files = os.listdir(root_dir)
files.sort()
for file in files:
    path = os.path.join(root_dir, file)
    if not os.path.isfile(path):
        continue

    # uncomment one of these batches of code:

    # f7z.write(path, file)

    # with open(path, "rb") as data:
    #     f7z.writef(data, file)

    # with open(path, "rb") as data:
    #     f7z.writestr(data.read(), file)

f7z.close()
f.close()
  1. Observe differences or errors

Expected behavior
Consistent behaviour, like the write method, and no exceptions.

Environment:

  • OS: Windows 10, 64-bit
  • Python 3.9.7
  • py7zr version: 0.17.2

Test data(please attach in the report):
Here are some example files:
test data.zip

Additional context
7z.exe listing of the expected behaviour (write):

 Attr         Size   Compressed  Name
----- ------------ ------------  ------------------------
....A           16          509  a.txt
....A           28               b.txt
....A            0               c.txt
....A          802               d.txt
....A           22               e.txt
----- ------------ ------------  ------------------------
               868          509  5 files

7z.exe listing of the writestr method: Observe the "shifted down" sizes for files, c.txt directory, and the missing 22-byte data that was in e.txt. e.txt now contains the data that should be stored in d.txt, and d.txt contains the c.txt empty data.

 Attr         Size   Compressed  Name
----- ------------ ------------  ------------------------
....A           16          496  a.txt
....A           28               b.txt
D...A            0               c.txt
....A            0               d.txt
....A          802               e.txt
----- ------------ ------------  ------------------------
               846          496  4 files, 1 folders

Stack trace for the writef method:

Traceback (most recent call last):
  File "main.py", line 19, in <module>
    f7z.writef(data, file)
  File "Python39\lib\site-packages\py7zr\py7zr.py", line 1035, in writef
    self.worker.archive(self.fp, self.files, folder, deref=False)
  File "Python39\lib\site-packages\py7zr\py7zr.py", line 1437, in archive
    foutsize, crc = self.writestr(fp, f, folder)
  File "Python39\lib\site-packages\py7zr\py7zr.py", line 1415, in writestr
    insize, foutsize, crc = compressor.compress(f.data(), fp)
  File "Python39\lib\site-packages\py7zr\compressor.py", line 835, in compress
    data = fd.read(self._block_size)
ValueError: read of closed file

Process finished with exit code 1
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@miurahr miurahr added bug Something isn't working and removed no-issue-activity labels Mar 25, 2022
@miurahr miurahr added help wanted Extra attention is needed for archiving Issue on archiving, compression or encryption labels Apr 18, 2022
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions
Copy link

github-actions bot commented Nov 7, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@meoow
Copy link

meoow commented Sep 22, 2023

I encountered the exact same bugs in 0.20.6, both shifted name when the there is empty file.

and the "ValueError: read of closed file" exception bug when using writef or writed methods for compressing more than a few files. After a few files being compressed, the "read of closed" error will be thrown.

zip_file = zipfile.ZipFile('/path/to/zipfile.zip', 'r')
archive = py7zr.SevenZipFile('/path/to/new.7z', 'w')
for file_info in zip_file.infolist():
    fname = file_info.filename
    if fname.endswith('/'):
        continue
    z1 = zip_file.open(fname)
    archive.writef(z1, fname)
    z1.close()
archive.close()

@meoow
Copy link

meoow commented Sep 22, 2023

I found one way workaround for compressing empty file is using a io.BytesIO(b'\x00') instead, so the file is not empty anymore but rather one byte file containing single NUL character. Of course this is not ideal but I think it is harmless and should be fine for MOST cases since the file is already empty in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working for archiving Issue on archiving, compression or encryption help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants