Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code cannot handle Error which are presented as "There are some data after the end of the payload data" in 7zip #536

Open
Mrw33554432 opened this issue Nov 1, 2023 · 4 comments

Comments

@Mrw33554432
Copy link

Describe the bug

For some old files that may contains error as described in the title, this package fail to work and runs the process forever (dead loop).

Related issue

To Reproduce
contact me and I will send the files to you, so you can have a try yourself.
Code fragment

    if archive_type == '7z':
        for member in archive.getnames():
            original_filename = member
            corrected_filename = guess_encoding(original_filename)
            try:
                extracted_path = os.path.join(extract_to, corrected_filename)
                if not os.path.exists(extracted_path):
                    archive.extract(targets=[member], path=extract_to)
                    logging.info(f"Extracted {corrected_filename} to {extract_to}")
                    # Check nested archive
                    recursive_extraction_check(extracted_path, extract_to)
            except Exception as e:  # Consider refining exception handling
                logging.error(f"Failed to extract {corrected_filename}: {e}")

Expected behavior
Report the error and exit

Environment (please complete the following information):

  • OS: Windows 10
  • Python 3.11
  • py7zr version: 0.20.5

Test data(please attach in the report):
Too large to upload

Additional context
I will try to fix it

@Mrw33554432
Copy link
Author

    def decompress(
        self,
        fp: BinaryIO,
        folder,
        fq: IO[Any],
        size: int,
        compressed_size: Optional[int],
        src_end: int,
    ) -> int:
        """
        decompressor wrapper called from extract method.

        :parameter fp: archive source file pointer
        :parameter folder: Folder object that have decompressor object.
        :parameter fq: output file pathlib.Path
        :parameter size: uncompressed size of target file.
        :parameter compressed_size: compressed size of target file.
        :parameter src_end: end position of the folder

        :returns None

        """
        assert folder is not None
        out_remaining = size
        max_block_size = get_memory_limit()
        crc32 = 0
        decompressor = folder.get_decompressor(compressed_size)
        while out_remaining > 0:
            tmp = decompressor.decompress(fp, min(out_remaining, max_block_size))
            if len(tmp) > 0:
                out_remaining -= len(tmp)
                fq.write(tmp)
                crc32 = calculate_crc32(tmp, crc32)
            if out_remaining <= 0:
                break
        if fp.tell() >= src_end:
            if decompressor.crc is not None and not decompressor.check_crc():
                raise CrcError(decompressor.crc, decompressor.digest, None)
        return crc32

The while out_remaining>0 is keep causing this error.
bug

@miurahr
Copy link
Owner

miurahr commented Nov 1, 2023

You are welcome to give us a patch!

@Mrw33554432
Copy link
Author

You are welcome to give us a patch!

I have fixed it (at least for my case), and sent the PR. Seems like the auto test is having some trouble with build...

@miurahr
Copy link
Owner

miurahr commented Nov 5, 2023

@Mrw33554432
Could you tell me what data comes to cause the issue? It seems like an invalid/bad data that have extra data after payload which is not indexed in header index?
If it is the issue to handle bad data, we should make py7zr robust and hardening for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants