Memory leaks when uncompressing multi-volume archives #575
Labels
for extraction
Issue on extraction, decompression or decryption
help wanted
Extra attention is needed
Speed/Performance
Describe the bug
It seems decompressing a multi-volume archive with a relatively large amount of files (321 "outer" volumes, 8000 compressed files, 400kb each, so 3 Gb in total) is producing some memory leaks.
The basic code which is failing is:
See complete function: uncompress.py.txt
The corresponding archive is a multi-volume archive of 8000 files, 400 kb per file, filled with random data, and splitted each 10 mb. No filters, specific headers, encryption or password have been set. The compression options have been set to the defaults.
A copy of the archive is available here: multi.zip. Please note that the first level needs to be uncompressed manually before the test. The actual archive to be tested is the folder with the 321 "7z" volumes.
Better, it is possible to reproduce this archive (modulo the random data) using the following code: compress.py.txt. Several tests indicates that the behavior is not related to the random content, only related to the size of the files.
If enough memory is available, the archive can be uncompressed without any issue. The process is still taking a lot of memory (ie, 3.3 gb of memory), which is not expected as each compressed file is quite small, and the uncompression script discards immediately any data on the fly.
If not enough memory is available, the uncompression script is crashing, with a
CRC error
(see log below) or aBad7zFile: invalid header data
. Actually, it seems the CRC error is only a consequence of the lack of memory, as the archive looks perfectly fine.7z-crc-error.log
We can see the archive is error-free:
Note that for the purpose of tests, it is possible to deliberately fill the memory using commands such as:
head -c 5G /dev/zero | tail
Related issue
These issues might be related to this one, but none of the existing tickets mention multivolume and OOM at the same time:
To Reproduce
ps up <pid>
in another terminal to see how memory is increasingExpected behavior
Even if the archive has a total size of 3 gb, it is not expected that uncompressing it file by file, where each file is 400 kb, fills the memory. Uncompressing a multi-volume archive should have a very low memory footprint, as it should be possible to directly write the bytes on the disk, whatever size of the archive, size of individual files, amount of volumes or amount of compressed files we have in the archive.
Environment (please complete the following information):
Test data(please attach in the report):
See provided archive or script to generate it above.
Additional context
The text was updated successfully, but these errors were encountered: