Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved encoding and decoding speed of Vec<u8> #619

Merged
merged 4 commits into from
Mar 30, 2023

Conversation

VictorKoenders
Copy link
Contributor

Running against the benchmarks in #618

bench v1                time:   [48.663 µs 49.119 µs 49.573 µs]
                        change: [-0.9286% +0.8850% +2.7884%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

bench v2 (standard)     time:   [642.39 µs 646.82 µs 651.41 µs]
                        change: [-71.758% -71.415% -71.093%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

bench v2 (legacy)       time:   [650.62 µs 653.72 µs 656.93 µs]
                        change: [-71.322% -71.089% -70.844%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

bench v1 decode         time:   [301.58 µs 302.76 µs 303.97 µs]
                        change: [-42.809% -40.022% -36.789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

bench v2 decode (legacy)
                        time:   [326.89 µs 328.24 µs 329.63 µs]
                        change: [-81.836% -81.650% -81.457%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild

decoding seems to be on-par with bincode 1 now. Encoding is down to being a factor 10 slower.

@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Patch coverage: 71.79% and project coverage change: -0.21 ⚠️

Comparison is base (6791311) 54.22% compared to head (e613898) 54.01%.

Additional details and impacted files
@@            Coverage Diff             @@
##            trunk     #619      +/-   ##
==========================================
- Coverage   54.22%   54.01%   -0.21%     
==========================================
  Files          50       51       +1     
  Lines        4406     4447      +41     
==========================================
+ Hits         2389     2402      +13     
- Misses       2017     2045      +28     
Impacted Files Coverage Δ
benches/string.rs 0.00% <0.00%> (ø)
src/de/impls.rs 58.24% <0.00%> (ø)
src/enc/encoder.rs 57.14% <ø> (ø)
tests/alloc.rs 93.75% <ø> (ø)
tests/std.rs 97.53% <ø> (ø)
src/varint/decode_unsigned.rs 69.41% <42.85%> (ø)
src/enc/write.rs 68.75% <66.66%> (-0.49%) ⬇️
src/features/impl_alloc.rs 61.65% <82.35%> (+1.12%) ⬆️
src/enc/impls.rs 88.88% <100.00%> (-0.29%) ⬇️
tests/basic_types.rs 98.02% <100.00%> (+0.02%) ⬆️
... and 2 more

... and 8 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@VictorKoenders
Copy link
Contributor Author

Pre-computing the size drops the execution speed down by another 50%

bench v1                time:   [51.224 µs 51.781 µs 52.294 µs]
                        change: [+5.4681% +7.4038% +9.4670%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 24 outliers among 100 measurements (24.00%)
  3 (3.00%) low severe
  10 (10.00%) low mild
  9 (9.00%) high mild
  2 (2.00%) high severe

bench v2 (standard)     time:   [363.31 µs 364.96 µs 366.79 µs]
                        change: [-83.761% -83.543% -83.335%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

bench v2 (legacy)       time:   [326.16 µs 327.39 µs 328.81 µs]
                        change: [-85.568% -85.459% -85.338%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

bench v1 decode         time:   [339.01 µs 341.76 µs 344.70 µs]
                        change: [-34.888% -31.648% -27.911%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

bench v2 decode (legacy)
                        time:   [351.59 µs 357.93 µs 364.61 µs]
                        change: [-80.044% -79.714% -79.330%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

@VictorKoenders
Copy link
Contributor Author

I'm willing to call this good if this is okay with you @JojiiOfficial

Execution speeds in the order of 51.781 µs and 364.96 µs are essentially instant. We can probably optimize this further, but we're hunting very specific LLVM optimizations at this point. In a real world scenario we would almost certainly not notice a difference between the two.

I'd suggest we say that this is fast enough for now and if we find a real world example where bincode 2 is significantly slower than bincode 1 we can add more benchmarks.

@@ -262,10 +278,20 @@ where

impl<T> Decode for Vec<T>
where
T: Decode,
T: Decode + 'static,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this bound rule out zero-copy deserialization of types like Vec<&'de str>?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my own question: No, as this case goes though impl BorrowDecode for Vec<T>.

@JojiiOfficial
Copy link

Encoding still is 6 times slower than in v1. A real world example could be a lazy loaded data structure (eg. an index) that keeps its data encoded and if an item is accessed, it gets decoded. If you're doing this a million of times lets say you're building the index, v2 will take 6 times longer than v1 (As you do a lot of encoding when building). An index built with V2 will take 6 minutes where the same data structure will only take 1 minute if it was using bincode v1. If v1 takes 1h, v2 will take 6 hours and so on...

Not saying this fix isn't ok just want to point out that there are real world scenarios where this will probably hurt. In my opinion opening a tracking issue and merging this PR should be fine for now but having a de/serializing library for binary data that is implemented as fast as possible with as much optimizations as possible would be really nice. Maybe there are even more possible optimizations that would make it even faster than v1 🤔

@VictorKoenders
Copy link
Contributor Author

If you have a benchmark with data that lasts a minute to encode/decode that would be very interesting to dive into

@JojiiOfficial
Copy link

JojiiOfficial commented Feb 20, 2023

Change the parameter for build_data from 100 to 2500 and it'll take around a minute with this PR merged for 100 iterations

@JojiiOfficial
Copy link

The following code improves the performance even more for Strings and [u8] arrays:

impl<T> Encode for [T]
where
    T: Encode + 'static,
{
    fn encode<E: Encoder>(&self, encoder: &mut E) -> Result<(), EncodeError> {
        super::encode_slice_len(encoder, self.len())?;

        if core::any::TypeId::of::<T>() == core::any::TypeId::of::<u8>() {
            let t: &[u8] = unsafe { transmute(self) };
            encoder.writer().write(t)?;
            return Ok(());
        }

        for item in self {
            item.encode(encoder)?;
        }
        Ok(())
    }
}

image

@VictorKoenders
Copy link
Contributor Author

VictorKoenders commented Feb 26, 2023

Added a couple of #[inline] and now the performance is almost identical

bench v1                time:   [58.788 µs 59.134 µs 59.497 µs]                                                                          
bench v2 (standard)     time:   [61.595 µs 62.437 µs 63.531 µs]                                                  
bench v2 (legacy)       time:   [62.514 µs 63.413 µs 64.394 µs]

bench v1 decode         time:   [314.30 µs 317.03 µs 319.98 µs]
bench v2 decode (legacy)
                        time:   [376.05 µs 378.22 µs 380.79 µs]

@JojiiOfficial
Copy link

Indeed, very nice!

@VictorKoenders VictorKoenders merged commit c623d81 into trunk Mar 30, 2023
67 of 69 checks passed
@VictorKoenders VictorKoenders deleted the vko/improve_string_decode branch March 30, 2023 09:47
@@ -295,10 +295,17 @@ impl Encode for char {

impl<T> Encode for [T]
where
T: Encode,
T: Encode + 'static,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change now gives me errors when trying to derive Encode for types with Vecs that contain references.

For example:

struct MyStruct<'a> {
  some_field: Vec<&'a str>
}

Argument requires that 'a must outlive 'static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this PR in #663 and tried a better fix in #667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants