Proposal for overhaul of buffer representation #251

ollie-etl · 2023-02-27T11:07:24Z

This follows on from #161 #216 at least, but lays out a concrete proposal and rational. This proposal would represent a major API change.

I am proposing unifying all buffers with a single, non-parameterized buffer type, similar to owned-buf. This proposed buffer has the following properties. For the sake of this Proposal, I'll refer to it as Buffer.

Properties of `Buffer`

Concrete
Non parameterized
Support multiple underlying memory regions. That is, the data represented by a Buffer may be backed by multiple, non contiguos memory regions. This is inspired by the trait bytes::buf.
Internally maintains it's representation of memory regions as contiguous iovec
Allow bytes_init to shrink, making it a semantically a bytes_available count

Motivation

Many of the function in the API in tokio-uring either allocate, or require allocating a Vec externally when operating over vectored or multipart memory regions. Specifically readv, writev, recvmsg and sendmsg variants. This is a shame, because these calls tend to be called on when trying to optimise performance (in my experience). The allocation occurs because the underlying uring ops require a contiguous iovec to represent the operation.

Candidate design

/// The unified owned buffer type
///
/// Note: There is a case for employing SSO for the Vec's here, to account for the very common case of a single memory region.
/// This can be achieved without exceeding cacheline storage for Buffer
///
/// ```
/// assert!(std::mem::size_of<Buffer>() <= 64);
/// ```
pub struct Buffer {
   /// Holds the memory regions represented by the current buffer
   /// Because this is held in contiguous memory, it can be used directly in 
   /// iov_base: holds the pointer to the start of the region
   /// iov_len: holds the current length of each segment.
   iovec: Vec<libc::iovec>,
   /// Metadata associated with each buffer region
   /// state in AoS (Array of structures) form.
   /// Chosen to keep `Buffer` within a single cache line on 64 B cache width systems
   state: Vec<BufferState>,
}

/// The record of state for each memory region.
pub(crate) struct BufferState {
    /// The maximum size of this memory region.
    total_bytes: usize,
    /// Called to drop the memory
    dst: unsafe fn(libc::iovec, usize),
}

impl Drop for Buffer{
    fn drop(&mut self) {
        let Self{iovecs, state} = self;
        for i in 0..iovecs.len() {
            unsafe {state[i].dst(iovecs[i], state[i].total_bytes)}
        }
    }
}

impl From<Vec<u8>> for Buffer {
    fn from(buf: Vec<u8>) -> Self {
        let (base, iov_len, total_bytes) = buf.into_raw_parts();
        let iov = libc::iovec{
            iov_base: base as _,
            iov_len,
        };
        let state = BufferState::new(total_bytes, drop_buffer_vec);
        Buffer::new(vec![iov], vec![state])
    }
}

unsafe fn drop_buffer_vec(iovec: libc::iovec, total_bytes: usize){
    Vec::from_raw_parts(iovec.iov_base as _, iovec.iov_len, total_bytes);
}

Notes

The destructor is, obviously, completely horrible, but allows us the type erasure.
It may be that BufferState should hold some additional meta data, to cope with more generalized backing types. If this is required, I'd expect it to be held as a * c_void, to maintain type erasure. the destructor would take this as a parameter. I don't yet know if this is required, so leave it out the sketch

Impact on API

This is a bit hand-wavey, because I haven't given all aspects due consideration. However, it does seem that some rationalization becomes available. Want to write data?

   let buf = Vec![0,1,2,3].into();
   fd.write(buf);

Want to write vectored data?

   let mut buf = vec![0,1,2,3].into();
   let buf.append(vec![4,5,6,7].into());
   fd.write(buf);

Request for feedback

@Noah-Kennedy @carllerche @FrankReh @mzabaluev @redbaron @nrc @rrichardson

I'm prepared to flesh this out more into a PR, provided I can get an indication this is not dead in the water before I start. This would represent a significant time investment, which I'm unwilling to commit to without initial feedback.

The text was updated successfully, but these errors were encountered:

ollie-etl · 2023-02-27T11:15:33Z

Something which does strike me with this design is the similarity to the existing registries. I strongly suspect these could be unified too, probably by parameterising a Rawbuffer over the held state.

   type Buffer = Rawbuffer<BufferState>;
   type Registry = Rawbuffer<RegistryState>;

FrankReh · 2023-02-27T12:00:03Z

If this works, this is a great idea.

If you didn't provide a straw man proposal for a FixedBuf, is that because you don't know yet or because it's obvious? I would want the BufX I have in my BufRing code to also use it of course.

In the frequent case of there being a single allocation being the backing store, do you see the ref and the ref_mut being as efficient as before? Perhaps with the exception of a test added, to the accessor itself (ref or ref_mut), that the iovec length is 1?

What would it do to the API if the buffer were made of multiple backing stores though. Doesn't it mean the ref and ref_mut can't be asked to return a &[u8] or &mut[u8]?

Maybe an approach would be those would panic if used on a multiple-backing-stored buffer. And there would be test (aka try) functions that returned an Option<&[u8]>. But it would be up to the user of the library to either know they are only using single-backing-stored buffers so they can safely use ref and ref_mut, or they have to be safe and use something like try_ref and try_ref_mut? Kind of like users have to know if dividing by their variable is safe or they should be adding their own test for zero.

Would it subsume the slice struct too or would that be a second concrete type?

FrankReh · 2023-02-27T12:13:30Z

Something which does strike me with this design is the similarity to the existing registries. I strongly suspect these could be unified too, probably by parameterising a Rawbuffer over the held state.
   type Buffer = Rawbuffer<BufferState>;
   type Registry = Rawbuffer<RegistryState>;

I don't follow. I thought you're thinking a concrete, non-parameterized type could do it all?

ollie-etl · 2023-02-27T12:19:36Z

Something which does strike me with this design is the similarity to the existing registries. I strongly suspect these could be unified too, probably by parameterising a Rawbuffer over the held state.
   type Buffer = Rawbuffer<BufferState>;
   type Registry = Rawbuffer<RegistryState>;
I don't follow. I thought you're thinking a concrete, non-parameterized type could do it all?

Yes, badly phrased by me: I can unify the buffer types without parameterisation. It may be possible tounify both Buffer and BufferRegistries by parameterising the underlying state - given a buffer can now represent multiple memory regions, and a Registry holds multiple memory regions, this is perhaps unsurprising.

ollie-etl · 2023-02-27T12:21:38Z

If you didn't provide a straw man proposal for a FixedBuf, is that because you don't know yet or because it's obvious? I would want the BufX I have in my BufRing code to also use it of course.

I've not yet fleshed this out. It is certinaly possible, and highly desireable, the question is only do I need some additional meta data in the state to achieve it: i.e, on drop, is some additional info required.

ollie-etl · 2023-02-27T12:23:50Z

What would it do to the API if the buffer were made of multiple backing stores though. Doesn't it mean the ref and ref_mut can't be asked to return a &[u8] or &mut[u8]?

Looking at bytes::buf for guidance here, there is a api call called chunks, which returns regions which are contiguos. I'd expect to also be able to return an iterator over such regions

FrankReh · 2023-02-27T12:27:18Z

If you didn't provide a straw man proposal for a FixedBuf, is that because you don't know yet or because it's obvious? I would want the BufX I have in my BufRing code to also use it of course.

I've not yet fleshed this out. It is certinaly possible, the question is only do I need some additional meta data in the state to achieve it: i.e, on drop, is some additional info required.

I would hope other meta data would not be needed.

I've wondered why something like a buf id was needed in the first place. Couldn't we say our registries have to be built from a contiguous block of memory so the buffer address being returned in the destructor would be enough to identifier the portion of the region that was being returned to the registry?

I know in my own case of using a buf id in the BufRing implementation, it felt wrong. But I was just following what had already been established in the crate. But if we're talking about a better implementation for all, I would heavily lean towards requiring the registry to be able to use the buffer address itself as the identifier - even if it means we can't use a vec of vecs as a registry any more.

FrankReh · 2023-02-27T12:42:01Z

I don't love the idea that every buffer has two allocations before even talking about the backing store buffer.

And why is the state vec containing various destructors. Do we lose flexibility by saying there is one destructor for the entire buffer, and it has to know how to deal with the iovec parts?

I hesitate to mention this, but would it be a reason to bring a tinyvec into the discussion so the most normal case of a single backing store is handled with no extra allocations?

ollie-etl · 2023-02-27T12:50:24Z

I hesitate to mention this, but would it be a reason to bring a tinyvec into the discussion so the most normal case of a single backing store is handled with no extra allocations?

Yes: you'll see I made mention to the case for SSO. I'd have thought this was a very good optimization. We have 16 bytes left on the cache line, so it doesn't even need to rely on nonnull optimizations, although it could.

nrc · 2023-03-03T01:02:57Z

Support multiple underlying memory regions. That is, the data represented by a Buffer may be backed by multiple, non contiguos memory regions. This is inspired by the trait bytes::buf.

I'm curious if there are known use cases or concrete motivations for this (as well as just following bytes::Buf)?

Is the use of iovec to support vectored IO?

ollie-etl · 2023-03-03T08:47:46Z

Is the use of iovec to support vectored IO?

Yes - motivated by writev, readv, sendmsg and readmsg

ollie-etl mentioned this issue Feb 27, 2023

op: add recv_multi #250

Open

Noah-Kennedy added the buffers label Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for overhaul of buffer representation #251

Proposal for overhaul of buffer representation #251

ollie-etl commented Feb 27, 2023 •

edited

ollie-etl commented Feb 27, 2023

FrankReh commented Feb 27, 2023 •

edited

FrankReh commented Feb 27, 2023

ollie-etl commented Feb 27, 2023 •

edited

ollie-etl commented Feb 27, 2023 •

edited

ollie-etl commented Feb 27, 2023

FrankReh commented Feb 27, 2023

FrankReh commented Feb 27, 2023

ollie-etl commented Feb 27, 2023 •

edited

nrc commented Mar 3, 2023

ollie-etl commented Mar 3, 2023 •

edited

Proposal for overhaul of buffer representation #251

Proposal for overhaul of buffer representation #251

Comments

ollie-etl commented Feb 27, 2023 • edited

Properties of Buffer

Motivation

Candidate design

Notes

Impact on API

Request for feedback

ollie-etl commented Feb 27, 2023

FrankReh commented Feb 27, 2023 • edited

FrankReh commented Feb 27, 2023

ollie-etl commented Feb 27, 2023 • edited

ollie-etl commented Feb 27, 2023 • edited

ollie-etl commented Feb 27, 2023

FrankReh commented Feb 27, 2023

FrankReh commented Feb 27, 2023

ollie-etl commented Feb 27, 2023 • edited

nrc commented Mar 3, 2023

ollie-etl commented Mar 3, 2023 • edited

ollie-etl commented Feb 27, 2023 •

edited

Properties of `Buffer`

FrankReh commented Feb 27, 2023 •

edited

ollie-etl commented Feb 27, 2023 •

edited

ollie-etl commented Feb 27, 2023 •

edited

ollie-etl commented Feb 27, 2023 •

edited

ollie-etl commented Mar 3, 2023 •

edited