Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py7zr creates different archive when writing byte streams to archive #343

Open
reimarstier opened this issue Jul 8, 2021 · 7 comments
Open
Labels
bug Something isn't working for archiving Issue on archiving, compression or encryption

Comments

@reimarstier
Copy link

Describe the bug
When using py7zr to write file structure containing files with zero bytes to archive then 7z on Windows is able to open said file.
When using py7zr to write empty byte stream to archive then 7z on Windows is not able to open said file (py7zr is still able to open file though).

test_create_archive__from_dir() works while test_create_archive__that_7z_cannot_extract() raises an error in 7z on Windows.
I'd expect these too strategies to produce the same archive. Same result when using archive.writeall() or archive.writef().

To Reproduce
https://gist.github.com/reimarstier/8aa6822045dc6b562beea44799f94061

Expected behavior
Archive created in both cases should be able to be opened by 7z on windows.

Environment (please complete the following information):

  • OS: Windows 10
  • Python 3.8
  • py7zr version: 0.16.1
@miurahr
Copy link
Owner

miurahr commented Jul 8, 2021

Test cases: test_archive_empty_file and test_archive_empty_file1 is passed that test a extraction by libarchive and p7zip on windows and linux on v0.16.1.

That is why your 7z extractor may have a compatibility issue.

https://github.com/miurahr/py7zr/actions/runs/820905436

Does py7zr produce a file that breaks libarchive, p7zip and 7z? Could you upload the file?

Could you propose a reproducer as test case?

You can use two helper function that run external 7z command and libarchive python library to extract target file.

p7zip_test(target_path)
libarchive_extract(target_path, extract_path)

ref https://github.com/miurahr/py7zr/blob/master/tests/test_archive.py#L1013-L1050

@reimarstier
Copy link
Author

Hey, thanks for your quick response and sorry for taking so long to get back to you. Both of your test cases use the write() method adding files from the file system. Am I wrong to use writef() to write byte streams? This seems to be working fine for most cases but for empty file streams it fails:

def test_create_archive__empty_file_from_stream(tmp_path):
    archive_file = Path(tmp_path).joinpath("archive.7z")
    output_dir = Path(tmp_path).joinpath("output")
    output_dir.mkdir()
    empty_byte_stream = io.BytesIO()

    with py7zr.SevenZipFile(archive_file, 'w') as archive:
        empty_byte_stream.seek(0)
        archive.writef(empty_byte_stream, arcname="empty.txt")

    extract(archive_path=archive_file, output_directory=output_dir)

@miurahr
Copy link
Owner

miurahr commented Jul 15, 2021

py7zr cannot know its zero bytes before reading the stream.

py7zr create directory entry as a file in 7zip archive, read bytes, compress it, and write to 7zip archive, then last put file size to the archive.

7zip command may create zero size file as an only exist on directory entry, and no data entry.

This might make difference.

py7zr can create a empty file with same manner when passing path to archive() function. It check file size and create empty file when size is zero.
write() accept stream that may not have a file size(before read), so it treat empty file as a file which has zero data.

@github-actions
Copy link

github-actions bot commented Nov 6, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions
Copy link

github-actions bot commented Feb 5, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions
Copy link

github-actions bot commented May 7, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@miurahr miurahr added the for archiving Issue on archiving, compression or encryption label Jul 2, 2022
@github-actions
Copy link

github-actions bot commented Oct 1, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working for archiving Issue on archiving, compression or encryption
Projects
None yet
Development

No branches or pull requests

2 participants