Zero-copy encoding and decoding #18

vincentdephily · 2019-10-25T09:20:38Z

It might be possible to do some zero-copy encoding or decoding, especially of the publish payload. Bytes can do some refcounting behind the scene, so if we switch publish.payload from a Vec<u8> to a Bytes we should only have to create the slice, not its content.

We should revise whether encode() and decode() should take an impl IntoBuf and impl IntoBufMut rather than a straight BytesMut.

Last but not least, this all needs to be benchmarked.

The text was updated successfully, but these errors were encountered:

vincentdephily · 2020-01-10T19:02:59Z

@00imvj00 Please take a look at vincentdephily@71627f9

I've spent some time trying to convert to impl Trait but have hit a few walls. We've got a few options:

Stop trying to be flexible, and embrace BytesMut everywhere.
Wait for (or help) the Bytes crate to switch the specified bugs.
Use the .bytes() workaround and document that only continuous-memory backends may be used (or if upstream creates a Trait for that, use it).
Use a different crate altogether, or reinvent the wheel.

I'm currently leaning towards option 2, but would like some opinions. I'd also be ok with throwing that commit away and go with option 1.

Note that one thing we need to do to get toward zero-copy is convert some Packet members (Connect.password, Publish.payload...) from Vec<u8> to Bytes (or use &[u8] and deal with the shared lifetime between Packet and its source/destination buffer). If that's where we're going, maybe it's silly to try to support other Buf backends.

00imvj00 · 2020-01-11T07:19:24Z

For Zero-copy, instead of this, how about we keep references of offset?

For example: If the overall packet length is 64 bytes, we store these bytes as byte-array internally and then for the password, instead of a string, we just store something like, from index to index, and build method get_password() -> &str, where we will return the reference.

vincentdephily · 2020-01-13T16:50:35Z

A Bytes struct obtained from Bytes::slice*() or Bytes::split*() is pretty-much just a reference inside a &[u8], with some refcounting to figure out when the actual data can be dropped. I'm just not sure how it handles references to old data. Is it viable to keep using the same BytesMut to receive data from the network and then decode(), yielding a stream of Packet ? Or would holding a Byte from the connect packet force the buffer to grow forever ? Should we change the API to decode(Bytes) -> Result<Packet,Error> (forcing a .freeze()) to avoid that footgun ?

I'm going to dogfood that in different contexts and see, but it's going to take a while.

00imvj00 · 2020-01-14T04:14:09Z

Yup, at this point we can experiment with couple of approaches and see which feels ergonomic.

Again, the goal is to keep is simple and make it super performant.

MathiasKoch mentioned this issue Jun 9, 2020

Make alloc optional, defaulting to slice based API #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-copy encoding and decoding #18

Zero-copy encoding and decoding #18

vincentdephily commented Oct 25, 2019

vincentdephily commented Jan 10, 2020

00imvj00 commented Jan 11, 2020

vincentdephily commented Jan 13, 2020

00imvj00 commented Jan 14, 2020

Zero-copy encoding and decoding #18

Zero-copy encoding and decoding #18

Comments

vincentdephily commented Oct 25, 2019

vincentdephily commented Jan 10, 2020

00imvj00 commented Jan 11, 2020

vincentdephily commented Jan 13, 2020

00imvj00 commented Jan 14, 2020