Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flate2 ZLibDecoder reads too much data from files. #338

Open
kallisti5 opened this issue Feb 28, 2023 · 3 comments
Open

flate2 ZLibDecoder reads too much data from files. #338

kallisti5 opened this issue Feb 28, 2023 · 3 comments

Comments

@kallisti5
Copy link

kallisti5 commented Feb 28, 2023

The flate2 ZLibDecoder seems to read too much data and advance file pointers too far.

Parsing a raw file filled with individual hunks of zlib compressed data:

  • 80-22216 Zlib Chunk 1
    • dd if=sample/ctags_source-5.8-5-source.hpkg bs=1 skip=80 count=22137 of=test1
    • file test1 ; test1: zlib compressed data
  • 22217-44314 Zlib Chunk 2
    • dd if=sample/ctags_source-5.8-5-source.hpkg bs=1 skip=22217 count=22097 of=test2
    • file test2 ; test2: zlib compressed data

I can validate these chunks:

  • cat test1 | zlib-flate -uncompress > test1.uncompressed
  • cat test2 | zlib-flate -uncompress > test2.uncompressed
[kallisti5@eris hpkg-rs]$ ls -lah test1.uncompressed 
-rw-r--r-- 1 kallisti5 users 64K Feb 28 09:19 test1.uncompressed
[kallisti5@eris hpkg-rs]$ ls -lah test2.uncompressed 
-rw-r--r-- 1 kallisti5 users 64K Feb 28 09:20 test2.uncompressed

However.. when I try to decode these chunks with flate2 / ZlibDecoder..

       /// Inflate the heap section of a hpkg for later processing
        /// XXX: This will likely need reworked... just trying to figure out what's going on
        fn inflate_heap(&mut self) -> Result<usize, Box<dyn error::Error>> {
                let header = self.header.as_ref().unwrap();
                let filename = self.filename.as_ref().unwrap();

                println!("header: Heap chunk size: {}", header.heap_chunk_size);
                println!("header: Heap compressed size: {}", header.heap_size_compressed);
                println!("header: Heap uncompressed size: {}", header.heap_size_uncompressed);
                println!("header: Heap compression: {}", header.heap_compression);

                let mut f = File::open(filename)?;

                let mut pos = header.header_size as u64;
                f.seek(SeekFrom::Start(pos))?;

                while pos < header.heap_size_compressed {
                        println!("Seek from file {}, heap {}", pos, pos - header.header_size as u64);
                        let mut reader: Box<dyn Read> = match header.heap_compression {
                                0 => Box::new(&f),
                                1 => Box::new(ZlibDecoder::new(&f)),
                                2 => Box::new(zstd::stream::read::Decoder::new(&f)?),
                                _ => return Err(From::from(format!("Unknown hpkg heap compression: {}", header.heap_compression)))
                        };
                        let mut buffer = vec![0; header.heap_chunk_size as usize];
                        reader.read_exact(&mut buffer)?;
                        self.heap_data.push(buffer);
                        pos = (&f).stream_position()?;
                }
                Ok(0)
        }
---- package::tests::test_package_load_valid stdout ----
header: Heap chunk size: 65536     (uncompressed yo)
header: Heap compressed size: 501432
header: Heap uncompressed size: 1988947
header: Heap compression: 1
Seek from file 80, heap 0
Seek from file 32848, heap 32768
ERROR: corrupt deflate stream
thread 'package::tests::test_package_load_valid' panicked at 'assertion failed: false', src/package.rs:297:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

read_exact produces 64k uncompressed as expected, but the file pointer is moved 32k in the file vs the expected 22057 bytes. (22057 marks the "end" of the compressed data stream and the start of the next 0x78, 0xDA)

@kallisti5
Copy link
Author

I feel like the solution is once ZLibDecoder reads the final 32k chunk, it should "back the source reader to the end of the zlib stream"?

This would enable users to know where the last compressed stream ended within a file.

@kallisti5
Copy link
Author

https://github.com/rust-lang/flate2-rs/blob/main/src/deflate/read.rs#L161 feels like the source of the guaranteed 32k read from the files.

@jongiddy
Copy link
Contributor

I think this is the same problem as #367 except that that issue is for gzip. Essentially, this is expected behavior for the read interfaces that actually wrap the Read type in a new std::io::BufReader for each decoder.

To fix, wrap the File in a BufReader once and you can then pass it to multiple bufread::ZlibDecoder instances.

This will, however, also make stream_position incorrect, even if you can access it, so you will need a different way to know when to terminate the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants