-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change objects in recvBuffer queue from interface to concrete type to reduce allocs #1029
Conversation
FTR the remaining sources appear to be more difficult to remove: for the rest of the alloc sources, it appears that:
But profile with this shows ~15% cum. CPU clock in transport.(*recvBufferReader).Read and 22% in write syscalls. |
Without intending to sound pedantic, the source of improvement here is the removal of the The commit message and PR description should both be amended to reflect the true source of improvement here. EDIT: Apologies, the |
@tamird thanks but I think the removal of the interface effect ends up having the same benefit of copy-by-ref to copy-by-value. I was thrown off by this as changing the pure |
Yes, that's correct, and that's my point - simply changing The bottom line: this improvement is due to the specialization of |
I see, updated title |
Cool, thanks. Please update the commit message as well. |
31ab052
to
0974ac3
Compare
done, thanks! |
@apolcyn what is the state of this PR? |
e0c0ea9
to
7c0c116
Compare
7c0c116
to
b9b4d17
Compare
Just updated, I think this should still be good. cc @mehrdada for how this incorporates your recent change to |
transport/transport.go
Outdated
@@ -105,18 +102,19 @@ func (b *recvBuffer) load() { | |||
if len(b.backlog) > 0 { | |||
select { | |||
case b.c <- b.backlog[0]: | |||
b.backlog[0] = nil | |||
b.backlog[0].data = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b.backlog[0] = recvMsg{}
is more robust if we add fields to the struct in the future.
Also, do we ever close the channel in recvMsg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, assuming perf is not worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mehrdada done for first comment, sorry for delay.
Also, do we ever close the channel in recvMsg?
I don't think we do explicitly close this channel, but I think it's ok for this channel, since the blocking read can unblock when the context is Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
I took your branch and rebased it (locally) and the tests still passed. Merging. |
This removes the next largest remaining source of total allocs and appears to make a slight ~2.5-3% QPS improvement after #940, #1010, and #1028.
Based off of the server protobuf streaming QPS test memory profile with the combined changes, shown in #1028, the allocation in:
looks to be from allocating recvMsg structs onto the heap and passing their reference onto the buffer in
https://github.com/grpc/grpc-go/blob/master/transport/transport.go#L316
Changing the channel recvBuffer to pass by the struct onto the buffer by value, on top of the combined changes in #940, #1010, and #1028, memory profile looks like:
(same testing environment as in #1028)
(total allocation goes from ~3.7GB to ~2.5GB)
QPS roughly goes from ~816K (with combined memory reduces so far) to ~840K on sequential runs.