Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use const generics to remove BitTree heap allocations #79

Merged
merged 1 commit into from Mar 16, 2023

Conversation

chyyran
Copy link
Contributor

@chyyran chyyran commented Aug 9, 2022

Pull Request Overview

#19 was blocked on #[feature(generic_const_exprs)] for the 1 << NUM_BITS issue but we do not actually need generic_const_exprs to utilize const generics at the expense of having a slightly surprising API for BitTree. Since BitTree is an internal implementation detail, I do not think it is too much of a trade-off for the performance benefits seen. The strong compile time guarantee of the validity of BitTree when using NUM_BITS in an expression can be replicated with a compile time assert in the constructor.

There are 2 different API shapes for BitTree proposed here. Both APIs validate their const arguments at compile time, and invalid instances of BitTree are still unconstructable.

1. BitTree<const NUM_BITS: usize, const PROBS_ARRAY_LEN: usize> (06aee69)
2. BitTree<const PROBS_ARRAY_LEN: usize> (260eefe)

The first signature must have the caller input both NUM_BITS, and PROBS_ARRAY_LEN == 1 << NUM_BITS as part of the type signature. The equality of PROBS_ARRAY_LEN is checked by a compile-time assert macro that simply checks that indeed PROBS_ARRAY_LEN == 1 << NUM_BITS. This prevents the construction of a BitTree where PROBS_ARRAY_LEN is not 1 << NUM_BITS

The second signature takes only PROBS_ARRAY_LEN to sidestep the requirement for generic_const_exprs. Instead, NUM_BITS is an associated constant that is calculated as floor(log_2(PROBS_ARRAY_LEN)) using PROBS_ARRAY_LEN.trailing_zeros() which has been const since Rust 1.32. The const assert instead checks that PROBS_ARRAY_LEN == 1 << (PROBS_ARRAY_LEN.trailing_zeros()). Thus, rather than BitTree being valid for any NUM_BITS and PROBS_ARRAY_LEN == 1 << NUM_BITS, BitTree in this API is valid for any PROBS_ARRAY_LEN such that PROBS_ARRAY_LEN == 2 ** floor(log_2(PROBS_ARRAY_LEN)), which gives us effectively the same guarantees as the first signature without the redundant NUM_BITS argument.

Both signatures compile to the same code and have the same performance. It is a matter of preference and readability on which to take. I have a slight preference for the second signature because it is less noisy and seeing BitTree<{1 << 8}> for example seems fairly obvious, but purely looking at the implementation, I prefer the less 'clever' first signature.

This PR bumps the MSRV to 1.57 for const generics and const_panic.

Benchmarks

In general, variances are reduced and there is a speed bump when accessing the dictionary during decompression. While I did apply const-generics to encode::rangecoder::BitTree for consistency, they don't seem to be currently used for compression so compression benchmarks should not be heavily affected if at all.

Without const generics

running 8 tests
test compress_65536                  ... bench:   1,312,743 ns/iter (+/- 45,836)
test compress_empty                  ... bench:         712 ns/iter (+/- 55)
test compress_hello                  ... bench:       1,022 ns/iter (+/- 18)
test decompress_after_compress_65536 ... bench:   1,157,757 ns/iter (+/- 64,077)
test decompress_after_compress_empty ... bench:       1,897 ns/iter (+/- 82)
test decompress_after_compress_hello ... bench:       2,256 ns/iter (+/- 165)
test decompress_big_file             ... bench:   3,615,488 ns/iter (+/- 128,563)
test decompress_huge_dict            ... bench:       2,318 ns/iter (+/- 135)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 16.21s

With const generic BitTree

test compress_65536                  ... bench:   1,317,185 ns/iter (+/- 58,197)
test compress_empty                  ... bench:         791 ns/iter (+/- 65)
test compress_hello                  ... bench:       1,108 ns/iter (+/- 45)
test decompress_after_compress_65536 ... bench:   1,189,749 ns/iter (+/- 60,014)
test decompress_after_compress_empty ... bench:         530 ns/iter (+/- 23)
test decompress_after_compress_hello ... bench:         787 ns/iter (+/- 26)
test decompress_big_file             ... bench:   3,600,336 ns/iter (+/- 375,302)
test decompress_huge_dict            ... bench:         873 ns/iter (+/- 33)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 11.17s

This should fix #19, after this PR, DecoderState only has the single heap allocation of literal_probs which is substantially more difficult to put on the stack.

Testing Strategy

  • This is a backend change and does not introduce additional functionality, existing tests should pass.

@chyyran chyyran force-pushed the feature-inline-bittree branch 2 times, most recently from 02ecc33 to 37e09d1 Compare August 9, 2022 07:44
@chyyran

This comment was marked as outdated.

@chyyran

This comment was marked as outdated.

@chyyran chyyran force-pushed the feature-inline-bittree branch 4 times, most recently from e211b5d to 260eefe Compare September 3, 2022 15:36
@chyyran
Copy link
Contributor Author

chyyran commented Sep 3, 2022

Marking this as ready since #77 was merged. I've cleaned up the prior commits and just left the two possible APIs.

@gendx you will want to review both 06aee69 and 260eefe which are identical except for the signature of BitTree, as explained in the overview.

Given that generic_const_exprs doesn't seem to be coming very soon, and that BitTree is an internal API that isn't exposed, I'm of the opinion that the perf improvements are worth the hit to readability, but you may want to consider re-running some benchmarks on your end as well.

Unfortunately DecoderState::literal_probs is too big to fit on the stack so I believe this will close #19 unless there are other places where const generics can be taken advantage of.

@chyyran chyyran marked this pull request as ready for review September 3, 2022 15:46
Copy link
Owner

@gendx gendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay in the last few months.

Overall this PR looks good, I have some comments mostly regarding the static assertion macro.

Apart from that, I've just added code coverage reporting (#86), so let's see how this PR works with it.

src/encode/rangecoder.rs Show resolved Hide resolved
src/encode/rangecoder.rs Show resolved Hide resolved
src/encode/rangecoder.rs Outdated Show resolved Hide resolved
src/encode/rangecoder.rs Outdated Show resolved Hide resolved
src/encode/rangecoder.rs Show resolved Hide resolved
src/encode/rangecoder.rs Show resolved Hide resolved
src/util/mod.rs Outdated Show resolved Hide resolved
src/util/assert.rs Outdated Show resolved Hide resolved
src/util/assert.rs Outdated Show resolved Hide resolved
src/decode/rangecoder.rs Outdated Show resolved Hide resolved
@chyyran chyyran force-pushed the feature-inline-bittree branch 3 times, most recently from e67c2bc to 025c331 Compare November 1, 2022 02:06
@codecov-commenter
Copy link

codecov-commenter commented Nov 1, 2022

Codecov Report

Base: 86.83% // Head: 87.08% // Increases project coverage by +0.25% 🎉

Coverage data is based on head (571be60) compared to base (17bb25f).
Patch coverage: 99.32% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #79      +/-   ##
==========================================
+ Coverage   86.83%   87.08%   +0.25%     
==========================================
  Files          19       19              
  Lines        2484     2540      +56     
==========================================
+ Hits         2157     2212      +55     
- Misses        327      328       +1     
Flag Coverage Δ
integration 81.66% <100.00%> (+0.06%) ⬆️
unit 55.66% <93.24%> (+1.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/encode/rangecoder.rs 96.58% <98.80%> (+0.24%) ⬆️
src/decode/lzma.rs 91.74% <100.00%> (+0.09%) ⬆️
src/decode/rangecoder.rs 95.36% <100.00%> (+0.04%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

The single-argument that BitTree takes is 1 << NUM_BITS (2 ** NUM_BITS)
for the number of bits required in the tree.

This is due to restrictions on const generic expressions.
The validity of this argument is checked at compile-time with a macro
that confirms that the argument P passed is indeed 1 << N for
some N using usize::trailing_zeros to calculate floor(log_2(P)).

Thus, BitTree<const P: usize> is only valid for any P such that
P = 2 ** floor(log_2(P)), where P is the length of the probability array
of the BitTree. This maintains the invariant that P = 1 << N.
@chyyran
Copy link
Contributor Author

chyyran commented Nov 1, 2022

I've rebased on master, fixing the conflicts and squashed the commits into one. Looks like there's a small improvement in code coverage as well!

@chyyran
Copy link
Contributor Author

chyyran commented Jan 17, 2023

@gendx Any updates on getting this merged?

@gendx
Copy link
Owner

gendx commented Mar 16, 2023

Apologies again for the delay. I'll merge this now and fix any pending issue on top of it.

Thanks a lot for your contribution!

@gendx gendx merged commit 81cee7d into gendx:master Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use const-generics
3 participants