Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing in a preallocated buffer to LzBuffer constructors #76

Closed

Conversation

chyyran
Copy link
Contributor

@chyyran chyyran commented Aug 5, 2022

Pull Request Overview

This is a small experiment building off of the raw primitives introduced in #74 to design a solution for #27.

This PR introduces the functions LzmaDecoder::new_with_buffer and Lzma2Decoder::new_with_buffer that allow constructing an LZMA(2) decoder with a preallocated Vec<u8> buffer. The buffer is then reused on subsequent calls to decompress. Lzma(2)Decoder::reset will clear (memset) the buffer without dropping the prior allocation although it may be resized if necessary within internal decompression routines.

This allows consumers to pre-allocate a buffer upfront to mitigate the performance impact seen after #22 if necessary when using the raw decoder API.

I'm not actually sure this gives any performance benefit but I'm just putting it out here to get some thoughts.

Help Wanted

This pull request still needs proper benchmarking in a variety of cases. For my own usage in https://github.com/SnowflakePowered/chd-rs, I saw absolutely no performance difference between preallocating a buffer or using a fresh one via Vec::new(). There was also no difference in performance for my use case gained by reusing the buffer on each decompress.

Testing Strategy

This pull request was tested by...

  • Added relevant unit tests.
  • Added relevant end-to-end tests (such as .lzma, .lzma2, .xz files).

The preallocated buffer is then reused throughout subsequent calls to
`decompress`
@chyyran
Copy link
Contributor Author

chyyran commented Aug 5, 2022

Benchmarks on master on reusing a preallocated buffer

$ cargo bench
    Finished bench [optimized] target(s) in 0.16s
     Running unittests (target/release/deps/lzma_rs-0d912beaf58a9aab)

running 14 tests                             
test decode::options::test::test_options ... ignored                                          
test encode::rangecoder::test::test_encode_decode_bittree_all ... ignored                     
test encode::rangecoder::test::test_encode_decode_bittree_ones ... ignored                    
test encode::rangecoder::test::test_encode_decode_bittree_zeros ... ignored                   
test encode::rangecoder::test::test_encode_decode_length_all ... ignored                      
test encode::rangecoder::test::test_encode_decode_length_zeros ... ignored                    
test encode::rangecoder::test::test_encode_decode_ones ... ignored                            
test encode::rangecoder::test::test_encode_decode_reverse_bittree_all ... ignored             
test encode::rangecoder::test::test_encode_decode_reverse_bittree_ones ... ignored            
test encode::rangecoder::test::test_encode_decode_reverse_bittree_zeros ... ignored           
test encode::rangecoder::test::test_encode_decode_zeros ... ignored                           
test error::test::test_display ... ignored                                                    
test xz::test::test_checkmethod_roundtrip ... ignored                                         
test xz::test::test_streamflags_roundtrip ... ignored                                         
                                                                                              
test result: ok. 0 passed; 0 failed; 14 ignored; 0 measured; 0 filtered out; finished in 0.00s
                                                                                              
     Running unittests (target/release/deps/lzma-99b2a32c4e93afcf)                            

running 8 tests
test compress_65536                  ... bench:   1,384,063 ns/iter (+/- 26,842)
test compress_empty                  ... bench:         701 ns/iter (+/- 29)
test compress_hello                  ... bench:       1,011 ns/iter (+/- 40)
test decompress_after_compress_65536 ... bench:   2,558,868 ns/iter (+/- 45,196)
test decompress_after_compress_empty ... bench:       2,130 ns/iter (+/- 49)
test decompress_after_compress_hello ... bench:       2,683 ns/iter (+/- 67)
test decompress_big_file             ... bench:   3,929,622 ns/iter (+/- 98,028)
test decompress_huge_dict            ... bench:       2,741 ns/iter (+/- 79)                 
                                                                                             
test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 8.87s

Benchmarks on this branch with default Vec::new() behaviour

$ cargo bench
   Compiling lzma-rs v0.2.0 (/mnt/d/coding/lzma-rs)
    Finished bench [optimized] target(s) in 2.43s
     Running unittests (target/release/deps/lzma_rs-0d912beaf58a9aab)

running 14 tests                                    
test decode::options::test::test_options ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_length_all ... ignored
test encode::rangecoder::test::test_encode_decode_length_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_zeros ... ignored
test error::test::test_display ... ignored
test xz::test::test_checkmethod_roundtrip ... ignored
test xz::test::test_streamflags_roundtrip ... ignored

test result: ok. 0 passed; 0 failed; 14 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/release/deps/lzma-99b2a32c4e93afcf)

running 8 tests
test compress_65536                  ... bench:   1,357,296 ns/iter (+/- 67,203)
test compress_empty                  ... bench:         996 ns/iter (+/- 25)
test compress_hello                  ... bench:       1,306 ns/iter (+/- 59)
test decompress_after_compress_65536 ... bench:   2,522,253 ns/iter (+/- 58,323)
test decompress_after_compress_empty ... bench:       2,171 ns/iter (+/- 138)
test decompress_after_compress_hello ... bench:       2,655 ns/iter (+/- 268)
test decompress_big_file             ... bench:   3,884,622 ns/iter (+/- 100,805)
test decompress_huge_dict            ... bench:       2,732 ns/iter (+/- 199)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 12.32s

Looks to be within a 2% difference...

@chyyran chyyran marked this pull request as draft August 9, 2022 04:18
@chyyran
Copy link
Contributor Author

chyyran commented Aug 9, 2022

Converting this to a draft as I'm not entirely happy with having to pass ownership around where unnecessary. Even if Vec::new() is effectively alloc-free that may not be true for some other storage in the future if that ever gets going.

Something like SnowflakePowered@9ea9da4 holds a mutable reference instead is probably preferable but that adds a somewhat noisy lifetime to LzBuffer.

@chyyran chyyran closed this Aug 9, 2022
@chyyran
Copy link
Contributor Author

chyyran commented Aug 9, 2022

Closing this for now, needs a little more work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant