Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup of ByteBufferAccumulator (approach 2) #11094

Closed
wants to merge 57 commits into from

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Dec 24, 2023

Alternative to #11058 to fix #10541

 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Merged aggregator into accumulator
 + Deleted ByteBufferAggregator.java
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Merged aggregator into accumulator
 + Deleted ByteBufferAggregator.java
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Merged aggregator into accumulator
 + Deleted ByteBufferAggregator.java
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Merged aggregator into accumulator
 + Deleted ByteBufferAggregator.java
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Merged aggregator into accumulator
 + Deprecated ByteBufferAggregator.java
 + Deprecated ByteBufferCallbackAccumulator
 + Added `isRetained()` to `Retainable`
 + Made `Chunk` is-a `RetainableByteBuffer`
 + retain rather than copy chunks where possible
 + Merged aggregator into accumulator
 + Deprecated ByteBufferAggregator.java
 + Deprecated ByteBufferCallbackAccumulator
 + Added append and writeTo methods to RetainableByteBuffer
 + Made Aggregator and Accumulator subtypes of RetainableByteBuffer
@gregw gregw requested a review from lorban December 24, 2023 00:24
 + Added append and writeTo methods to RetainableByteBuffer
 + Made Aggregator and Accumulator subtypes of RetainableByteBuffer
 + Added append and writeTo methods to RetainableByteBuffer
 + Made Aggregator and Accumulator subtypes of RetainableByteBuffer
@lorban
Copy link
Contributor

lorban commented Jan 16, 2024

I do like the new Aggregator and Accumulator design, including the fact that they are RetainableByteBuffer implementations.

But the API should be discussed as I would like it to be rock-solid as it's very central to a lot of potential problems.

WIP on review feedback
@gregw
Copy link
Contributor Author

gregw commented Jan 17, 2024

@lorban I've acted on most of your feedback, but a couple of questions on some of it.
In general, I happy to reduce the API to the minimal to ensure we get it right... and then we need to document it well.

However, I'm only partially available over the next few weeks, so feel free to contribute to this branch. I don't totally agree with the remaining API changes you suggest... but I don't totally disagree either, so he who does the work....
Also I think BufferedContentSink needs a lot of review at least, if not bug fixes.

@gregw
Copy link
Contributor Author

gregw commented Mar 20, 2024

@sbordet @lorban Firstly, I think it is too late for this PR in 12.0.8, let's punt to next cycle but PLEASE give this some consideration early in that cycle so we can have time to land it. I think there were a few bugs I found/fixed in this, so I might try to factor those out into another PR for 12.0.8.

The idea of this PR is to move towards the model of having a fully functional buffer abstraction that hides details such as:

  • a read-only fixed ByteBuffer with no space
  • a read/write single ByteBuffer with finite space
  • a read/write single ByteBuffer that may grow to a limit
  • a read/write chain of zero copy buffers
  • an optimized for fill RBB that keeps the buffer in fill mode mostly.

Ultimately we should be able to deprecate all the inconsistent BufferUtil APIs for append/put etc. and the RBB as the primary API that we use. It can hide all the flipping details as well various implementations. There can even be RBB impls optimized for filling, so the buffer is left in fill mode and only flipped when it is used!

@gregw gregw requested review from sbordet and lorban March 20, 2024 12:18
Use RBB.append rather than BU.append
@sbordet
Copy link
Contributor

sbordet commented Mar 20, 2024

The idea of this PR is to move towards the model of having a fully functional buffer abstraction that hides details such as:

I disagree, seems like you want to make RBB to be the jack of all trades, and it is not necessary at all.
We just add a couple of well designed utility classes such as Aggregator and Accumulator and we don't need anything else.

We obviously do not want to allow users to append to a Chunk, and you do not want to add an API to it, only to throw UnsupportedOperationException when they use it.

@gregw
Copy link
Contributor Author

gregw commented Mar 20, 2024

The idea of this PR is to move towards the model of having a fully functional buffer abstraction that hides details such as:

I disagree, seems like you want to make RBB to be the jack of all trades, and it is not necessary at all. We just add a couple of well designed utility classes such as Aggregator and Accumulator and we don't need anything else.

This PR actually started as one just replacing the existing Aggregator/Accumulators with better implementations, which I put up for draft review. However, @lorban and I worked those a bit and soon realized that they didn't really solve the problems and that there is a better way, hence the current form of this PR.

We currently have an API impedance between several key APIs:

  • between chunks and RBB.
  • between RBB and BBs
  • between BB and RBBs
  • between RBBs and Content.Sinks

The impedence between these APIs results in:

  • needless copying of data
  • a plethora of intermediate wrappers/converter/utility classes
    This is resulting in needless copying in many parts of the code.
  • code readability/maintainability issues.

For example, we have tried to bridge the impedance from BB to RBB with multiple explicit aggregator and accumulator classes. It has not worked and too often developers write their own new aggregators/accumulator rather than using what is there. We now have multiple stale implementations. Even if we put in the effort to add even more new Aggregators/Accumulators that we now say have the "right" API, it doesn't work because they now suffer from the chunk to RBB impedence.

The solution is not to write even more API adaptors/wrappers/convertors etc. that try to bridge this API impedance. The solution is to remove the impedance in the first place so that chunks can smoothly flow into RBBs, that can be used for IO to sinks or streams without falling back to BB APIs. This is achieved by having a good core buffer abstraction that provides all the commonly used methods directly on it, without the need to fall back to BB APIs and then to use BufferUtil or or BB based utility classes.

We already have a good core read side buffer abstraction with RBB that is used in our low level network code. For some reason, we then break that abstraction with the nearly identical chunk API, only to then struggle to get it back to BB APIs by using various RBB based utility classes. If we just fleshed out the RBB abstraction a bit, then it can be RBBs all the way through with no impedance, no conversions, no wrappers, no copies (unless wanted).

I agree that it is less than desirable have read/write APIs that throw UnsupportedOperationExceptions on the writers when they are read-only. But many MANY APIs work this way, least not ByteBuffer itself as well as Collection and many other core APIs. Thus I think it is a strawman to say that having a read/write API for chunk is a showstopper. Its primary API is BB and that is already read/write API. If we really want to have a read-only buffer abstraction, then this PR is the first step towards that as it will reduce the direct usage of the BB API. Ultimately we can introduce RBB.Mutable if that is really what you want, but I think that would be a next step after this PR. We have to first reduce the direct BB and BufferUtil usages, which this PR does. We've done this with HttpFields and HttpURI, and it has been worthwhile. So I'm not opposed to doing it with RBB, but we cannot whilst we are so dependent on BB APIs

@gregw
Copy link
Contributor Author

gregw commented Mar 20, 2024

Perhaps in jetty-12.1 we introduce a read-only Buffer and read/write Buffer.Mutable APIs. We can then have @Deprecated interface RetainableByteBuffer extends Buffer.Mutable

@gregw
Copy link
Contributor Author

gregw commented Mar 21, 2024

@sbordet to understand the API impedance I am referring to, consider writing a buffered echo from a sink to a source.

Currently, there are several point of impedance:

  • the sink produces chunks, which can neither be buffered nor written to a sink, and we need to convert to the BB API, which has mostly been done with a copy rather than a reference. If we are to avoid a copy, then chunk needs to be wrapped as a RBB
  • Our core buffer abstraction is insufficient for aggregating the chunks, so we need to use a utility class with a different API.... but we have about 6 different versions of those... so rather than pick one, we tend to write a new one instead.
  • Once we have buffered/aggregated/accumulated the wrapped chunks in our utility class, then that can't be used with a sink so we have to convert back to BB (maybe causing yet another copy) before we can write.

With this PR, there is no such impedance as our core buffer abstraction has a sufficiently powerful API and chunks are RBBs, so we can accumulate with buffer.append(chunk) and we can write back with buffer.writeTo(response, true, callback). No extra APIs involved, just our core RBB abstaction.

@sbordet
Copy link
Contributor

sbordet commented Mar 22, 2024

@gregw wrote:

the sink produces chunks, which can neither be buffered nor written to a sink, and we need to convert to the BB API, which has mostly been done with a copy rather than a reference. If we are to avoid a copy, then chunk needs to be wrapped as a RBB

That is not accurate. Chunk is retainable, and in fact we already have Content.Source.asRetainableByteBuffer() that efficiently does what you want. It is currently not used (perhaps a hint that it is not necessary?).
There is no copy, hence no need to wrap the Chunks as RBBs.

Once we have buffered/aggregated/accumulated the wrapped chunks in our utility class, then that can't be used with a sink so we have to convert back to BB (maybe causing yet another copy) before we can write.

This was analyzed extensively in the past, and we decided it was a bad idea to have more than 1 API in Sink to perform writes.
The reason was that Response is-a Sink, but having more than 1 API made wrapping more complicated, allowed more confusing semantic (can I write using 2 different APIs alternatively, can 1 API call the other, etc.) and did not play well with external applications that rely on the much, much more widely used ByteBuffer APIs (e.g. `UTF_8.encode("hello"), etc.).

Maybe there is an impedance, but we need minimal impedance for users, not for us.
Users will use ByteBuffer APIs because that's what vastly more common.
Sink cannot have multiple write APIs, and the single one must be based on ByteBuffer.
We can be efficient and precise at using our own aggregators and accumulators, which surely need to improve, but given the above I fail to see the reason for all the rest yet.

Is there a compelling use case you have found in the code and that I missed that cannot be done without copying?
AFAIK, either we can efficiently retain, or we explicitly want to copy (e.g. Servlet output aggregation).

@gregw
Copy link
Contributor Author

gregw commented Mar 23, 2024

@sbordet wrote:
Chunk is retainable, and in fact we already have Content.Source.asRetainableByteBuffer() that efficiently does what you want. It is currently not used (perhaps a hint that it is not necessary?). There is no copy, hence no need to wrap the Chunks as RBBs.

Looking at the current code base is not a good indication of what is necessary/good. We have multiple accumulator/aggregators, mostly predating RBB, so of course they are not used with asRetainableByteBuffer. The API impedance is that you need to find an accumulator utility class and then notice that it's API is for RBB and then find another utility class that will wrap the chunk as a RBB. So that's two conceptual leaps and a needless wrapper class.

As you say, chunk is Retainable and only needs to slightest adaption to be used as a RBB. This PR recognises that similarity and makes them the same, so no adaption is needed. Currently we have have:

  • Chunk is a Retainable; and has a ByteBuffer; and is part of a sequence of Chunks with a last one.

This PR just changes that to

  • Chunk is a RetainableByteBuffer; and is part of a sequence of Chunks with a last one.

What is RetainableByteBuffer if it not a Retainable that has a ByteBuffer? Why isn't a Chunk a RBB? What is the purpose of that adaption? what semantically does it change?
It is not a read-only adaption, because Chunk uses BB which is a read/write API. If we want Chunk to be truly read-only, then we need to abstract away from BB and RBB is the first step in that direction. Edit: I note that we already have Chunk Chunk.asReadOnly(), so the current code recognizes that sometimes chunks are read/write!

This was analyzed extensively in the past, and we decided it was a bad idea to have more than 1 API in Sink to perform writes...

So we have previously identified that there is a problem, but were unable to find a fix it in the Sink API. This PR has found another way. It does not change the Sink API, but simply makes RBB Sink aware so it can efficiently use the existing Sink API.

Maybe there is an impedance, but we need minimal impedance for users, not for us. Users will use ByteBuffer APIs because that's what vastly more common. Sink cannot have multiple write APIs, and the single one must be based on ByteBuffer.

Again, this PR is not proposing changing the Sink API and BB remains as part of the core Sink abstraction. But that should not mean that every other core abstraction has to devolve down to a single call to some getByteBuffer() method, forcing us to copy into one great big buffer. Instead this PR makes RBB is Sink aware, so can efficiently manage writing to a sink without any explicit usage of BB. Similarly it make RBB self aware, so RBBs can be appended to RBBs.

Is there a compelling use case you have found in the code and that I missed that cannot be done without copying? AFAIK, either we can efficiently retain, or we explicitly want to copy (e.g. Servlet output aggregation).

The compelling use-case is that the current code base has:

  • org.eclipse.jetty.io.ChunkAccumulator
  • org.eclipse.jetty.io.ByteBufferPool.Accumulator
  • org.eclipse.jetty.test.client.transport.HttpClientDemandTest.Accumulator
  • org.eclipse.jetty.io.ByteBufferAccumulator
  • org.eclipse.jetty.io.ByteBufferCallbackAccumulator
  • org.eclipse.jetty.io.ByteBufferAggregator
  • org.eclispe.jetty.util.BufferUtil
  • org.eclipse.jetty.util.IO
  • org.eclipse.jetty.io.Content

and more... all with APIs doing essentially the same job of copying bytes from a source to a destination, but all with slightly different APIs and often doing unnecessary copies. Because of API impedance, developers have had to adapt between our core APIs and the result is a mishmash of adaptions and utilities. This issue was opened as we recognised that problem, but initially thought that all we needed to do was introduce a few "better" utility classes, but this time with "better" APIs. We tried that, but it didn't fix the problem.

The problem is that we should not need to adapt between our core abstractions in the first place. We should not need utilities and wrappers and adaptors to allow our input to efficiently be buffered and for our buffers to be efficiently written to our output. To quote Elon Musk, "the best part is no part". Rather than make a better utility/adaptor/wrapper, this PR removes the need for them in the first place.

You appear to be arguing that BBs are the only common API that we can use between Sources and Sinks. But BB is a bad API. We've tried working around that with conventions and BufferUtil, but you hate BufferUtil and constantly argue against its use. We now have the RBB abstraction that wraps BB, so why not use that to avoid direct usage of BB whenever possible and thus reduce the need for direct interaction with BufferUtil?

So some questions for you:

  • why shouldn't our core input abstraction (Source) be directly compatible with our core buffer abstraction (RBB)?
  • why shouldn't our core buffer abstraction (RBB) be directly compatible with our core output abstraction (Sink)?

@gregw
Copy link
Contributor Author

gregw commented Mar 26, 2024

@lorban @sbordet after the call today, I think we agreed that there is a conceptual difference between the read-only chunk and the read-write RBB.... but that currently in the code base there is zero actual semantic difference between the APIs in this regards, and even some mistakes with making chunks look like they could be mutable.

Thus I think the way forward is to make a real API distinction between mutable and immutable retainable buffers. Chunk would extend the immutable API, whilst the buffer pools would provide the mutable version.

This is a much bigger change and probably a 12.1.0 thing (which the current PR was borderline being anyway).

/**
* @return an immutable version of this Chunk
*/
default Chunk asReadOnly()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think this method can only be deprecated for removal, as it is currently part of the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... unless we make this PR a 12.1 thing.... in which case we can deprecate in 12.0.x and remove here.

@gregw
Copy link
Contributor Author

gregw commented Mar 27, 2024

Currently on the RBB interface there are only 2 methods that mutate the contents of the buffer:

    /**
     * Copies the contents of the given byte buffer at the end of this buffer.
     * @param bytes the byte buffer to copy from.
     * @return true if all bytes of the given buffer were copied, false otherwise.
     * @throws ReadOnlyBufferException if the buffer is read only
     * @see BufferUtil#append(ByteBuffer, ByteBuffer)
     */
    boolean append(ByteBuffer bytes) throws ReadOnlyBufferException;

    /**
     * Copies the contents of the given retainable byte buffer at the end of this buffer.
     * @param bytes the retainable byte buffer to copy from.
     * @return true if all bytes of the given buffer were copied, false otherwise.
     * @throws ReadOnlyBufferException if the buffer is read only
     * @see BufferUtil#append(ByteBuffer, ByteBuffer)
     */
    boolean append(RetainableByteBuffer bytes) throws ReadOnlyBufferException;

Both are clearly marked that they can throw ReadOnlyBufferException.

To avoid these two methods, we would need to split RBB into two and do some difficult renaming. Options include :

interface ImmutableRetainableByteBuffer extends Retainable
{
    ByteBuffer getByteBuffer();
}

interface RetainableByteBuffer extends ImmutableRetainableByteBuffer
{
    boolean(append(ImmutableRetainableByteBuffer);
}

I don't like this style of naming, because a mutable RetainableByteBuffer is-a ImmutableRetainableByteBuffer, but it is not mutable. So it would be better to have:

interface RetainableByteBuffer extends extends Retainable
{
    ByteBuffer getByteBuffer();
    
    interface Mutable extends RetainableByteBuffer
    {  
        boolean(append(RetainableByteBuffer);
    }
}

But now all our ByteBufferPools need to return RetainableByteBuffer.Mutable rather than just a RetainableByteBuffer. There will also be many other API changes, so at this point we may as well just go for a shorter name:

interface Buffer extends extends Retainable
{
    ByteBuffer getByteBuffer();
    
    interface Mutable extends Buffer
    {  
        boolean(append(Buffer);
    }
}

ByteBuffer pools would operate on Buffer.Mutables and Chunks would extend just Buffer

It is a lot of disruption just to avoid 2 methods, that are semantic cousins of the chunk.getByteBuffer().put(...) APIs.

@gregw
Copy link
Contributor Author

gregw commented Mar 27, 2024

I've had a go at the split to a RetainableByteBuffer.Mutable in fix/jetty-12/10541/byteBufferAccumulator3
It is not horrible, but there are a lot of changes.... and it is all a but meaningless as it is still possible to create an immutable RBB that contains a mutable BB and to create a RBB.Mutable that contains an immutable BB. We would need a lot more rigour to ensure that the correct type of buffer is always passed in (and there is no isImmutable method on a BB!)

I'll keep going with it, but it is a huge disruption for 2 append methods that are not different to the put methods on BB, that we have lived with forever!

@lorban
Copy link
Contributor

lorban commented Mar 28, 2024

My feeling on this is that if we want to go down the route of splitting RetainableByteBuffer into mutable/immutable interfaces, getByteBuffer() should go away, like the following:

interface Buffer extends Retainable
{
    void writeTo(Content.Sink, boolean, Callback);
    void putTo(nio.ByteBuffer);
    
    interface Mutable extends Buffer
    {  
        boolean append(nio.ByteBuffer);
        int fillFrom(EndPoint);
    }
}

That would require massive modifications as almost everywhere we rely on nio.ByteBuffer (everywhere but EndPoint and Content.Sink?) we would need to make modifications to use our Buffer API instead. I'm not convinced we should go down that route, at least not now. But I also don't think we can improve RetainableByteBuffer by splitting it into read-only/read-write while still exposing the nio.ByteBuffer.

@gregw
Copy link
Contributor Author

gregw commented Mar 28, 2024

My feeling on this is that if we want to go down the route of splitting RetainableByteBuffer into mutable/immutable interfaces, getByteBuffer()

I agree... except perhaps not go away entirely, but be rendered rarely used and considered an antipattern.

So all the commonly used get methods from BB should be echoed on RBB. All the commonly used put should be reflected in RBB.Mutable, but with name changes to reflect that a flip, then flop is not needed. i.e. put is renamed to append, and works much the same, except the content in put between the limit and the capacity. Finally, the commonly used methods on BufferUtil for append etc. should be available on RBB and those methods deprecated.

I actually don't think this will be too much work... but we will see. I'll keep experimenting in the 3 branch.

@gregw
Copy link
Contributor Author

gregw commented Mar 29, 2024

That would require massive modifications as almost everywhere we rely on nio.ByteBuffer (everywhere but EndPoint and Content.Sink?) we would need to make modifications to use our Buffer API instead.

I have experimented with that in the 3 branch. There are a lot of good improvements... but then there are think some classes like HttpParser that need massive modifications.

I'm not convinced we should go down that route, at least not now.

I agree. If we had started out that way, then I think it would have been a good way to go.... but now it feels like a jetty-13 thing at best.

But I also don't think we can improve RetainableByteBuffer by splitting it into read-only/read-write while still exposing the nio.ByteBuffer

Also agree. Unless we rigorously address all usages to ensure that we don't have read-only BB inside read-write RBBs (and versa vice), then it is kind of pointless to split the API. It could be done, but I just don't see the benefit at this stage.

Meanwhile, this PR, without the split API contains some significant improvements: both in code simplicity and in avoiding needless copies. The cost is 2 append methods on RBB that clearly indicate in their signatures that the RBB might contain a read-only BB. Which is no different from the situation we have today, but without the explicit throws clauses in the API.

I don't think we should let this existing issue be a road block to these improvements!

Comment on lines +148 to +156
/**
* Creates a copy of this RetainableByteBuffer that is entirely independent, but
* backed by the same memory space, i.e.: modifying the ByteBuffer of the original
* also modifies the ByteBuffer of the copy and vice-versa.
* @return A copy of this RetainableByteBuffer
*/
default RetainableByteBuffer copy()
{
return new AbstractRetainableByteBuffer(BufferUtil.copy(getByteBuffer()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either the comment or the implementation is wrong. Comment implies shallow copy, but implementation is a deep copy!

@gregw
Copy link
Contributor Author

gregw commented Mar 29, 2024

@gregw said:

@lorban said:

But I also don't think we can improve RetainableByteBuffer by splitting it into read-only/read-write while still exposing the nio.ByteBuffer
Also agree. Unless we rigorously address all usages to ensure that we don't have read-only BB inside read-write RBBs (and versa vice), then it is kind of pointless to split the API. It could be done, but I just don't see the benefit at this stage.

I'm going to contradict myself here a little bit.. and maybe there is a mid ground that will keep @sbordet happy. If we moved the mutating methods to a RBB.Mutable interface, then the aggregators and accumulators could implement that. For users of normal RBB, we could have a RBB.asMutable method that provides only append methods for code that wants to use them rather than call getByteBuffer() and then use that API or BufferUtil. Internally, the BB could even be flipped to fill mode for efficiency, so every append need to flip to fill and then flip back to flush. We'd need some method to say we've finished mutating, so that the original RBB would see the flipped back results.

The cost would be some extra wrappers (or perhaps just a cast down) and the call to finish mutation... but it might be OK...

The Chunk API would still have asMutable() but it can be overridden to throw if ever called on a chunk.

I'll experiment with that approach.

@gregw
Copy link
Contributor Author

gregw commented Mar 31, 2024

I have essentially split this PR into:

@gregw gregw closed this Apr 5, 2024
@joakime joakime deleted the fix/jetty-12/10541/byteBufferAccumulator2 branch April 23, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review ByteBufferAccumulator
4 participants