Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: outbound protobuf zero-copy take 2 #7369

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

njhill
Copy link
Contributor

@njhill njhill commented Aug 27, 2020

Variation on #7350. Slightly bigger but I think the interface changes are cleaner, avoids "borrowing" a ByteBuffer and separately informing of bytes written. Not sure which approach is preferable.

@njhill njhill force-pushed the zero-out-2 branch 2 times, most recently from 16a62b2 to 5eb0dc6 Compare August 28, 2020 03:43
Variation on grpc#7350. Slightly bigger but I think the interface changes are cleaner, avoids "borrowing" a ByteBuffer and separately informing of bytes written. Not sure which approach is preferable.
@ericgribkoff
Copy link
Contributor

Hi @njhill. I see this and the "Random acts of garbage reduction" PR open, plus v1 of this PR still marked as a draft - would you like me to find someone to review these or is this still work-in-progress? Thanks!

@njhill
Copy link
Contributor Author

njhill commented Aug 28, 2020

@ericgribkoff yes they are all ready for review... I un-drafted the other one. Thanks!

@ericgribkoff ericgribkoff added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Aug 28, 2020
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Aug 28, 2020
@ejona86
Copy link
Member

ejona86 commented Aug 28, 2020

CC @voidzcy

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's two different types of data, and you can only trivially have zero-copy output for one type at a time: protobuf primitives and ByteStrings. This provides zero-copy for protobuf primitives as they can be written directly to the output buffer when encoding.

But for ByteString zero-copy see CodedOutputStream.ByteOutputEncoder. That "allows" (not actually exposed) using the ByteString's buffers directly. It works in the opposite direction (similar to how Drainable is the opposite of InputStream). It would be possible to get zero-copy primitives as well, but it really needs some sort of buffer allocator and then you need buffer lifetimes. I think the allocator thing is why writeLazy exists in ByteOutput, as the "lazy" buffers are having their ownership passed to the ByteOutput. But the API is not quite enough to use Netty's buffer pool, I believe.

This seems pretty fair, although it will only benefit small messages. Zero-copy typically becomes more important for larger messages and then may need to consider segmentation. We would want to see some positive impact in TransportBenchmark. I wouldn't be surprised if most of the benefit is reduced GC.

I'd sort of be interested what would happen if we just added the WritableBuffer.write(InputStream) method and used heap buffers with Netty instead of direct buffers. (Direct buffers for large things makes a lot of sense. For small things it is less clear.)

* @param dest the destination buffer to receive the bytes.
* @throws BufferOverflowException if destination buffer has insufficient remaining bytes
*/
int drainTo(ByteBuffer dest) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling the method "drain" would it be more apt to call it "read"? That would make the logic very similar to read(byte[] b, int off, int len).

I'm not wild about the BufferOverflowException and transferring of all bytes. I see little reason to limit the API to reading all the bytes and it means data in invalided if an exception is thrown. While having to call read() multiple times is annoying, it is that way for a reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree and this is actually what I started with for exactly the reasons you described, but I was thinking that the target impl doesn't actually work that way so this has the side effect that it will only be used when beneficial. And at least it's kind of similar to both ReadableBuffer#readBytes(ByteBuffer) which requires the target buffer to be large enough to hold the entire source, and the Drainable interface which has to drain the entire InputStream.

Having ProtoInputStream fall back to use ByteArrayInputStream in the case that the provided ByteBuffer isn't big enough would be worse than the current path, and this would not be obvious from the interface. WDYT?

It's also why I changed the name to drainTo, agree read more appropriate if just filled the ByteBuffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the target impl doesn't actually work that way so this has the side effect that it will only be used when beneficial

"Beneficial" is in the eye of the beholder. Yes, with the current code that is true. If this was in an API we could readily change, I'd agree. But since this is a public API that 1) will last a long time and 2) have multiple users I'd much rather favor the more generic method definition. I'm optimizing the API for longevity while making sure it allows optimizing the implementation.

As things change, my preference order would be:

  1. Any old code path is fast
  2. Any old code path works
  3. Any old code path fails (and this one is not actually acceptable)

Since drainTo(ByteBuffer) would require KnownLength, I'm not as fond of it. It could be useful to drain a compressed message to a ByteBuffer or some such.

Interfaces generally don't define guaranteed performance levels. List.get() may be fast or slow. We will consider performance when making changes, but at times we may need to hurt performance of one particular method for some other goal. So I'm okay with the varying level of performance. If you are benchmarking then you don't care about performance, and such a regression would show up on a benchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was in an API we could readily change, I'd agree.

How about just moving it to internal package then (alongside WritableBuffer)?

Probably this is clear but I meant that performance-wise the only implementation of this would actually involve more copying than less in the case that the stream didn't fit in the buffer i.e. worse than it is now and having the opposite than intended effect of this PR.

@njhill
Copy link
Contributor Author

njhill commented Aug 29, 2020

Thanks @ejona86 that's interesting... I did look fleetingly at ByteOutputEncoder but disregarded when I saw it was buffering. I see what you mean about the primitive vs byte chunk cases though. Not based on much but I would guess large strings are more common than large ByteStrings, and this approach should be better for those. In any case it's still at least as "zero-copy" as the inbound case... just that we could do even better for outbound with ByteStrings.

I don't think writeLazy is really related to ownership transfer as such, just means that the buffer is immutable and so safe to hang on to for reading later.

It would be possible to get zero-copy primitives as well, but it really needs some sort of buffer allocator and then you need buffer lifetimes.

I think this could be done without buffer lifetimes with a new CodedOutputStream impl, unfortunately it can't be subclassed externally :( Otherwise, maybe a ByteOutput-based version could be used selectively based on the proportion of the message made up of large ByteStrings or if the allocated ByteBuffer isn't large enough. Maybe I'll try to add that.

it will only benefit small messages

Up to 1MiB right? which doesn't seem too bad. And for larger it should no worse than status quo.

@@ -54,5 +64,5 @@
* Releases the buffer, indicating to the {@link WritableBufferAllocator} that
* this buffer is no longer used and its resources can be reused.
*/
void release();
void close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this interface and MessageFramer aren't directly related to the goal of this PR. Also, I'd vote for close() as the mirror of ReadableBuffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voidzcy could you clarify which changes specifically you're talking about in each? Obviously write(InputStream) is part of it. The other two methods in this interface are to make it compatible with OutputStream so that impls can impl both if useful, which NettyWritableBuffer now does, used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean changes like s/release/close/g, casting int to byte as well as adding the len > 0 check to the existing write-to-array method in MessageFramer, are sort of excessive regarding to the goal of this PR. They are irrelevant to things intended to be added. Also, close() would be preferable compared to release() as it's more abstract and ReadableBuffer also uses close().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close() would be preferable compared to release() as it's more abstract and ReadableBuffer also uses close()

You seem to be arguing for both sides of this in the same paragraph? :)

changes like s/release/close/g, casting int to byte

I just explained these in the preceding comment, they are in fact linked to this PR.

adding the len > 0 check

Sure I can remove this if you prefer

int writable = bytebuf.writableBytes();
int written = bytebuf.writeBytes(stream, writable);
if (written == writable && stream.available() != 0) {
throw new BufferOverflowException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to check writable capacity at this level? Since the WritableBuffer is a wrapper backed by some real buffer and all the operations are delegated to the underlaying buffer, we should just let the underlaying buffer handle the case and such an exception should be thrown from the underneath.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants