Add EncodedSize trait to calculate encoded sizes #609

markbt · 2023-01-24T15:49:01Z

Related issue: #539

Hi bincode developers. This is the first in a series of 3 PRs that would allow us to replace our current use of Abomonation with bincode, without a noticeable performance hit. We'd like these to be adopted upstream, and hope they are useful for other users of bincode also.

For some storage back-ends it's necessary to know the size of the buffer to pre-allocate before encoding. This is true of our cache back-end, which is C++ code that we communicate with over FFI. The bincode 1.0 version had bincode::serialized_size, which would tell you the serialized size of data, but for bincode 2.0 this is not implemented.

One suggestion is to implement a SizeOnlyWriter which adds up the sizes of the slices it's been given, however this effectively requires serializing the data twice, as it doesn't allow types to efficiently compute the size when it is possible to do so without performing the actual serialization steps.

In this PR, we add a new EncodedSize trait which accomplishes this objective, and implement that for all types. Implementations for custom structs and enums can be derived using #[derive(bincode::EncodedSize)].

For some storage back-ends it's necessary to know the size of the buffer to pre-allocate before encoding. The bincode 1.0 version had `bincode::serialized_size`, which would tell you the serialized size of data. We add a new `EncodedSize` trait which accomplishes the same objective, and implement that for all types. Implementations for custom structs and enums can be derived using `#[derive(bincode::EncodedSize)]`.

VictorKoenders · 2023-01-28T17:25:29Z

Hi, and thank you for this PR!

Scanning over this PR, I'm not actually sure this will be faster than SizeOnlyWriter. This PR still has to iterate over all the fields recursively and count all the entries.

Is it possible to set up a benchmark to compare the speed of an SizeOnlyWriter implementation, and this PR?

markbt · 2023-01-31T13:30:32Z

Benchmarking is a good idea. It ended up being a bit of a journey, and it looks like you're right, as long as LTO is enabled.

To start with, I added a variant of encoded_size to bincode which used SizeOnlyWriter. I do think it would be good to provide this as part of the library, if only as a convenience, to save everyone re-implementing it. I created a benchmark that computes the serialized size a Vec of 10,000 random integers of various sizes.

Results were initially pretty promising for the EncodedSize trait (all times in nanoseconds):

	u8	u16	u32	u64
SizeOnlyWriter	14770.00	29581	29416	37906
EncodedSize	1.45	12686	10511	12582

Yes, for Vec<u8> it's 10,000 times faster with EncodedSize.

That seemed pretty obvious to be an optimization problem, with Vec<u8> likely not being optimized to len(x) as it was with the EncodedSize variant of Vec<u8>.

To test this out, I moved the SizeOnlyWriter implementation to the benchmark crate:

	u8	u16	u32	u64
Local SizeOnlyWriter	0.206	3953	5741	7257

That's much faster, and even beats out the dedicated trait. I don't think this is a general-purpose solution: I don't think it's reasonable to copy and paste SizeOnlyWriter to every location in your codebase that you want to compute the size. Also, it's likely that it won't scale well, as it'll mean a new monomorphisation of the same code each time. But it gives a good lower bound.

Then I realised that LTO wasn't enabled, and that it might help. It did:

	u8	u16	u32	u64
SizeOnlyWriter + LTO	0.206	3864	5671	7218
EncodedSize + LTO	0.206	3876	5707	7268
Local SizeOnlyWriter + LTO	0.206	3877	5676	7242

So ultimately I think using SizeOnlyWriter is probably fine. I do think it would be nice if this was built into bincode as the recommended way of doing this (with the caveat that you shouldn't bother unless you really need to, of course).

Will leave this PR up for a bit in case anyone disagrees, but I now plan to close this PR.

The benchmark is in this commit if anyone wants to try it out themselves.

VictorKoenders · 2023-02-01T07:18:13Z

I do think it would be nice if this was built into bincode

I feel like added functionality like this this would be better in a different crate. In #578 (comment) I talked about introducing a crate named bincode-contrib or bincode-utils for common utilities that can be build on top of bincode without big changes.

Unfortunately both Zoey and I have been very busy the last couple of months so we have not had time to set this up yet

markbt closed this Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EncodedSize trait to calculate encoded sizes #609

Add EncodedSize trait to calculate encoded sizes #609

markbt commented Jan 24, 2023

VictorKoenders commented Jan 28, 2023

markbt commented Jan 31, 2023

VictorKoenders commented Feb 1, 2023

Add EncodedSize trait to calculate encoded sizes #609

Add EncodedSize trait to calculate encoded sizes #609

Conversation

markbt commented Jan 24, 2023

VictorKoenders commented Jan 28, 2023

markbt commented Jan 31, 2023

VictorKoenders commented Feb 1, 2023