Improved encoding and decoding speed of Vec<u8> #619

VictorKoenders · 2023-02-20T03:19:59Z

Running against the benchmarks in #618

bench v1                time:   [48.663 µs 49.119 µs 49.573 µs]
                        change: [-0.9286% +0.8850% +2.7884%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

bench v2 (standard)     time:   [642.39 µs 646.82 µs 651.41 µs]
                        change: [-71.758% -71.415% -71.093%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

bench v2 (legacy)       time:   [650.62 µs 653.72 µs 656.93 µs]
                        change: [-71.322% -71.089% -70.844%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

bench v1 decode         time:   [301.58 µs 302.76 µs 303.97 µs]
                        change: [-42.809% -40.022% -36.789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

bench v2 decode (legacy)
                        time:   [326.89 µs 328.24 µs 329.63 µs]
                        change: [-81.836% -81.650% -81.457%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild

decoding seems to be on-par with bincode 1 now. Encoding is down to being a factor 10 slower.

codecov · 2023-02-20T03:28:00Z

Codecov Report

Patch coverage: 71.79% and project coverage change: -0.21 ⚠️

Comparison is base (6791311) 54.22% compared to head (e613898) 54.01%.

Additional details and impacted files

@@            Coverage Diff             @@
##            trunk     #619      +/-   ##
==========================================
- Coverage   54.22%   54.01%   -0.21%     
==========================================
  Files          50       51       +1     
  Lines        4406     4447      +41     
==========================================
+ Hits         2389     2402      +13     
- Misses       2017     2045      +28

Impacted Files	Coverage Δ
benches/string.rs	`0.00% <0.00%> (ø)`
src/de/impls.rs	`58.24% <0.00%> (ø)`
src/enc/encoder.rs	`57.14% <ø> (ø)`
tests/alloc.rs	`93.75% <ø> (ø)`
tests/std.rs	`97.53% <ø> (ø)`
src/varint/decode_unsigned.rs	`69.41% <42.85%> (ø)`
src/enc/write.rs	`68.75% <66.66%> (-0.49%)`	⬇️
src/features/impl_alloc.rs	`61.65% <82.35%> (+1.12%)`	⬆️
src/enc/impls.rs	`88.88% <100.00%> (-0.29%)`	⬇️
tests/basic_types.rs	`98.02% <100.00%> (+0.02%)`	⬆️
... and 2 more

... and 8 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

VictorKoenders · 2023-02-20T03:40:56Z

Pre-computing the size drops the execution speed down by another 50%

bench v1                time:   [51.224 µs 51.781 µs 52.294 µs]
                        change: [+5.4681% +7.4038% +9.4670%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 24 outliers among 100 measurements (24.00%)
  3 (3.00%) low severe
  10 (10.00%) low mild
  9 (9.00%) high mild
  2 (2.00%) high severe

bench v2 (standard)     time:   [363.31 µs 364.96 µs 366.79 µs]
                        change: [-83.761% -83.543% -83.335%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

bench v2 (legacy)       time:   [326.16 µs 327.39 µs 328.81 µs]
                        change: [-85.568% -85.459% -85.338%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

bench v1 decode         time:   [339.01 µs 341.76 µs 344.70 µs]
                        change: [-34.888% -31.648% -27.911%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

bench v2 decode (legacy)
                        time:   [351.59 µs 357.93 µs 364.61 µs]
                        change: [-80.044% -79.714% -79.330%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

VictorKoenders · 2023-02-20T03:50:24Z

I'm willing to call this good if this is okay with you @JojiiOfficial

Execution speeds in the order of 51.781 µs and 364.96 µs are essentially instant. We can probably optimize this further, but we're hunting very specific LLVM optimizations at this point. In a real world scenario we would almost certainly not notice a difference between the two.

I'd suggest we say that this is fast enough for now and if we find a real world example where bincode 2 is significantly slower than bincode 1 we can add more benchmarks.

adamreichold · 2023-02-20T11:32:47Z

src/features/impl_alloc.rs

@@ -262,10 +278,20 @@ where

 impl<T> Decode for Vec<T>
 where
-    T: Decode,
+    T: Decode + 'static,


Doesn't this bound rule out zero-copy deserialization of types like Vec<&'de str>?

To answer my own question: No, as this case goes though impl BorrowDecode for Vec<T>.

JojiiOfficial · 2023-02-20T12:17:20Z

Encoding still is 6 times slower than in v1. A real world example could be a lazy loaded data structure (eg. an index) that keeps its data encoded and if an item is accessed, it gets decoded. If you're doing this a million of times lets say you're building the index, v2 will take 6 times longer than v1 (As you do a lot of encoding when building). An index built with V2 will take 6 minutes where the same data structure will only take 1 minute if it was using bincode v1. If v1 takes 1h, v2 will take 6 hours and so on...

Not saying this fix isn't ok just want to point out that there are real world scenarios where this will probably hurt. In my opinion opening a tracking issue and merging this PR should be fine for now but having a de/serializing library for binary data that is implemented as fast as possible with as much optimizations as possible would be really nice. Maybe there are even more possible optimizations that would make it even faster than v1 🤔

VictorKoenders · 2023-02-20T12:28:07Z

If you have a benchmark with data that lasts a minute to encode/decode that would be very interesting to dive into

JojiiOfficial · 2023-02-20T12:41:51Z

Change the parameter for build_data from 100 to 2500 and it'll take around a minute with this PR merged for 100 iterations

JojiiOfficial · 2023-02-20T16:37:20Z

The following code improves the performance even more for Strings and [u8] arrays:

impl<T> Encode for [T]
where
    T: Encode + 'static,
{
    fn encode<E: Encoder>(&self, encoder: &mut E) -> Result<(), EncodeError> {
        super::encode_slice_len(encoder, self.len())?;

        if core::any::TypeId::of::<T>() == core::any::TypeId::of::<u8>() {
            let t: &[u8] = unsafe { transmute(self) };
            encoder.writer().write(t)?;
            return Ok(());
        }

        for item in self {
            item.encode(encoder)?;
        }
        Ok(())
    }
}

VictorKoenders · 2023-02-26T17:07:29Z

Added a couple of #[inline] and now the performance is almost identical

bench v1                time:   [58.788 µs 59.134 µs 59.497 µs]                                                                          
bench v2 (standard)     time:   [61.595 µs 62.437 µs 63.531 µs]                                                  
bench v2 (legacy)       time:   [62.514 µs 63.413 µs 64.394 µs]

bench v1 decode         time:   [314.30 µs 317.03 µs 319.98 µs]
bench v2 decode (legacy)
                        time:   [376.05 µs 378.22 µs 380.79 µs]

JojiiOfficial · 2023-02-26T17:23:20Z

Indeed, very nice!

Added a SizeWriter because someone finally has a benchmark to show it's faster

stevenliebregt · 2023-06-08T08:32:59Z

src/enc/impls.rs

@@ -295,10 +295,17 @@ impl Encode for char {

 impl<T> Encode for [T]
 where
-    T: Encode,
+    T: Encode + 'static,


This change now gives me errors when trying to derive Encode for types with Vecs that contain references.

For example:

struct MyStruct<'a> { some_field: Vec<&'a str> }

Argument requires that 'a must outlive 'static

Reverted this PR in #663 and tried a better fix in #667

VictorKoenders requested a review from ZoeyR February 20, 2023 03:20

adamreichold reviewed Feb 20, 2023

View reviewed changes

antimora mentioned this pull request Mar 1, 2023

Add test coverage report to identify test gaps tracel-ai/burn#174

Closed

Victor Koenders added 4 commits March 30, 2023 11:09

Improved encoding and decoding speed of Vec<u8>

4a01495

Added black_box calls to benches/string.rs

20c0939

Added a SizeWriter because someone finally has a benchmark to show it's faster

Improved performance for impl<T> Encode for [T]

54769cf

Added #[inline] to impl Encoder for EncoderImpl

e613898

VictorKoenders force-pushed the vko/improve_string_decode branch from 0a0bb51 to e613898 Compare March 30, 2023 09:09

VictorKoenders merged commit c623d81 into trunk Mar 30, 2023
67 of 69 checks passed

VictorKoenders deleted the vko/improve_string_decode branch March 30, 2023 09:47

stevenliebregt reviewed Jun 8, 2023

View reviewed changes

VictorKoenders mentioned this pull request Sep 19, 2023

Reverted 'static constraint on T in Vec<T> and [T; N] #663

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved encoding and decoding speed of Vec<u8> #619

Improved encoding and decoding speed of Vec<u8> #619

VictorKoenders commented Feb 20, 2023

codecov bot commented Feb 20, 2023 •

edited

VictorKoenders commented Feb 20, 2023

VictorKoenders commented Feb 20, 2023

adamreichold Feb 20, 2023

adamreichold Feb 26, 2023

JojiiOfficial commented Feb 20, 2023

VictorKoenders commented Feb 20, 2023

JojiiOfficial commented Feb 20, 2023 •

edited

JojiiOfficial commented Feb 20, 2023

VictorKoenders commented Feb 26, 2023 •

edited

JojiiOfficial commented Feb 26, 2023

stevenliebregt Jun 8, 2023 •

edited

VictorKoenders Sep 26, 2023

Improved encoding and decoding speed of Vec<u8> #619

Improved encoding and decoding speed of Vec<u8> #619

Conversation

VictorKoenders commented Feb 20, 2023

codecov bot commented Feb 20, 2023 • edited

Codecov Report

VictorKoenders commented Feb 20, 2023

VictorKoenders commented Feb 20, 2023

adamreichold Feb 20, 2023

Choose a reason for hiding this comment

adamreichold Feb 26, 2023

Choose a reason for hiding this comment

JojiiOfficial commented Feb 20, 2023

VictorKoenders commented Feb 20, 2023

JojiiOfficial commented Feb 20, 2023 • edited

JojiiOfficial commented Feb 20, 2023

VictorKoenders commented Feb 26, 2023 • edited

JojiiOfficial commented Feb 26, 2023

stevenliebregt Jun 8, 2023 • edited

Choose a reason for hiding this comment

VictorKoenders Sep 26, 2023

Choose a reason for hiding this comment

codecov bot commented Feb 20, 2023 •

edited

JojiiOfficial commented Feb 20, 2023 •

edited

VictorKoenders commented Feb 26, 2023 •

edited

stevenliebregt Jun 8, 2023 •

edited