Allow passing in a preallocated buffer to LzBuffer constructors #76

chyyran · 2022-08-05T18:32:18Z

Pull Request Overview

This is a small experiment building off of the raw primitives introduced in #74 to design a solution for #27.

This PR introduces the functions LzmaDecoder::new_with_buffer and Lzma2Decoder::new_with_buffer that allow constructing an LZMA(2) decoder with a preallocated Vec<u8> buffer. The buffer is then reused on subsequent calls to decompress. Lzma(2)Decoder::reset will clear (memset) the buffer without dropping the prior allocation although it may be resized if necessary within internal decompression routines.

This allows consumers to pre-allocate a buffer upfront to mitigate the performance impact seen after #22 if necessary when using the raw decoder API.

I'm not actually sure this gives any performance benefit but I'm just putting it out here to get some thoughts.

Help Wanted

This pull request still needs proper benchmarking in a variety of cases. For my own usage in https://github.com/SnowflakePowered/chd-rs, I saw absolutely no performance difference between preallocating a buffer or using a fresh one via Vec::new(). There was also no difference in performance for my use case gained by reusing the buffer on each decompress.

Testing Strategy

This pull request was tested by...

Added relevant unit tests.
Added relevant end-to-end tests (such as .lzma, .lzma2, .xz files).

The preallocated buffer is then reused throughout subsequent calls to `decompress`

chyyran · 2022-08-05T22:19:53Z

Benchmarks on master on reusing a preallocated buffer

$ cargo bench
    Finished bench [optimized] target(s) in 0.16s
     Running unittests (target/release/deps/lzma_rs-0d912beaf58a9aab)

running 14 tests                             
test decode::options::test::test_options ... ignored                                          
test encode::rangecoder::test::test_encode_decode_bittree_all ... ignored                     
test encode::rangecoder::test::test_encode_decode_bittree_ones ... ignored                    
test encode::rangecoder::test::test_encode_decode_bittree_zeros ... ignored                   
test encode::rangecoder::test::test_encode_decode_length_all ... ignored                      
test encode::rangecoder::test::test_encode_decode_length_zeros ... ignored                    
test encode::rangecoder::test::test_encode_decode_ones ... ignored                            
test encode::rangecoder::test::test_encode_decode_reverse_bittree_all ... ignored             
test encode::rangecoder::test::test_encode_decode_reverse_bittree_ones ... ignored            
test encode::rangecoder::test::test_encode_decode_reverse_bittree_zeros ... ignored           
test encode::rangecoder::test::test_encode_decode_zeros ... ignored                           
test error::test::test_display ... ignored                                                    
test xz::test::test_checkmethod_roundtrip ... ignored                                         
test xz::test::test_streamflags_roundtrip ... ignored                                         
                                                                                              
test result: ok. 0 passed; 0 failed; 14 ignored; 0 measured; 0 filtered out; finished in 0.00s
                                                                                              
     Running unittests (target/release/deps/lzma-99b2a32c4e93afcf)                            

running 8 tests
test compress_65536                  ... bench:   1,384,063 ns/iter (+/- 26,842)
test compress_empty                  ... bench:         701 ns/iter (+/- 29)
test compress_hello                  ... bench:       1,011 ns/iter (+/- 40)
test decompress_after_compress_65536 ... bench:   2,558,868 ns/iter (+/- 45,196)
test decompress_after_compress_empty ... bench:       2,130 ns/iter (+/- 49)
test decompress_after_compress_hello ... bench:       2,683 ns/iter (+/- 67)
test decompress_big_file             ... bench:   3,929,622 ns/iter (+/- 98,028)
test decompress_huge_dict            ... bench:       2,741 ns/iter (+/- 79)                 
                                                                                             
test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 8.87s

Benchmarks on this branch with default Vec::new() behaviour

$ cargo bench
   Compiling lzma-rs v0.2.0 (/mnt/d/coding/lzma-rs)
    Finished bench [optimized] target(s) in 2.43s
     Running unittests (target/release/deps/lzma_rs-0d912beaf58a9aab)

running 14 tests                                    
test decode::options::test::test_options ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_length_all ... ignored
test encode::rangecoder::test::test_encode_decode_length_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_zeros ... ignored
test error::test::test_display ... ignored
test xz::test::test_checkmethod_roundtrip ... ignored
test xz::test::test_streamflags_roundtrip ... ignored

test result: ok. 0 passed; 0 failed; 14 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/release/deps/lzma-99b2a32c4e93afcf)

running 8 tests
test compress_65536                  ... bench:   1,357,296 ns/iter (+/- 67,203)
test compress_empty                  ... bench:         996 ns/iter (+/- 25)
test compress_hello                  ... bench:       1,306 ns/iter (+/- 59)
test decompress_after_compress_65536 ... bench:   2,522,253 ns/iter (+/- 58,323)
test decompress_after_compress_empty ... bench:       2,171 ns/iter (+/- 138)
test decompress_after_compress_hello ... bench:       2,655 ns/iter (+/- 268)
test decompress_big_file             ... bench:   3,884,622 ns/iter (+/- 100,805)
test decompress_huge_dict            ... bench:       2,732 ns/iter (+/- 199)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 12.32s

Looks to be within a 2% difference...

chyyran · 2022-08-09T04:20:49Z

Converting this to a draft as I'm not entirely happy with having to pass ownership around where unnecessary. Even if Vec::new() is effectively alloc-free that may not be true for some other storage in the future if that ever gets going.

Something like SnowflakePowered@9ea9da4 holds a mutable reference instead is probably preferable but that adds a somewhat noisy lifetime to LzBuffer.

chyyran · 2022-08-09T08:00:34Z

Closing this for now, needs a little more work.

decompress: allow passing in a preallocated buffer

ce0135a

The preallocated buffer is then reused throughout subsequent calls to `decompress`

chyyran marked this pull request as draft August 9, 2022 04:18

chyyran closed this Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow passing in a preallocated buffer to LzBuffer constructors #76

Allow passing in a preallocated buffer to LzBuffer constructors #76

chyyran commented Aug 5, 2022

chyyran commented Aug 5, 2022 •

edited

chyyran commented Aug 9, 2022 •

edited

chyyran commented Aug 9, 2022

Allow passing in a preallocated buffer to LzBuffer constructors #76

Allow passing in a preallocated buffer to LzBuffer constructors #76

Conversation

chyyran commented Aug 5, 2022

Pull Request Overview

Help Wanted

Testing Strategy

chyyran commented Aug 5, 2022 • edited

chyyran commented Aug 9, 2022 • edited

chyyran commented Aug 9, 2022

chyyran commented Aug 5, 2022 •

edited

chyyran commented Aug 9, 2022 •

edited