Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change objects in recvBuffer queue from interface to concrete type to reduce allocs #1029

Merged
merged 2 commits into from
Jun 26, 2017

Conversation

apolcyn
Copy link
Contributor

@apolcyn apolcyn commented Dec 21, 2016

This removes the next largest remaining source of total allocs and appears to make a slight ~2.5-3% QPS improvement after #940, #1010, and #1028.

Based off of the server protobuf streaming QPS test memory profile with the combined changes, shown in #1028, the allocation in:

1342.06MB 35.49% 35.49%  1342.06MB 35.49%  google.golang.org/grpc/transport.(*Stream).write

looks to be from allocating recvMsg structs onto the heap and passing their reference onto the buffer in
https://github.com/grpc/grpc-go/blob/master/transport/transport.go#L316

Changing the channel recvBuffer to pass by the struct onto the buffer by value, on top of the combined changes in #940, #1010, and #1028, memory profile looks like:

(same testing environment as in #1028)

2475.08MB of 2501.20MB total (98.96%)
Dropped 88 nodes (cum <= 12.51MB)
      flat  flat%   sum%        cum   cum%
 1330.56MB 53.20% 53.20%  1330.56MB 53.20%  golang.org/x/net/http2.parseDataFrame
  463.51MB 18.53% 71.73%   928.51MB 37.12%  google.golang.org/grpc.encode
  456.01MB 18.23% 89.96%   456.01MB 18.23%  github.com/golang/protobuf/proto.(*Buffer).enc_len_thing
     214MB  8.56% 98.52%      214MB  8.56%  google.golang.org/grpc/transport.(*http2Server).handleData
       9MB  0.36% 98.88%   465.01MB 18.59%  google.golang.org/grpc.protoCodec.Marshal
       2MB  0.08% 98.96%   934.51MB 37.36%  google.golang.org/grpc.(*Server).processStreamingRPC

(total allocation goes from ~3.7GB to ~2.5GB)

QPS roughly goes from ~816K (with combined memory reduces so far) to ~840K on sequential runs.

@apolcyn
Copy link
Contributor Author

apolcyn commented Dec 21, 2016

FTR the remaining sources appear to be more difficult to remove:

for the rest of the alloc sources, it appears that:
1330.56MB 53.20% 53.20% 1330.56MB 53.20% golang.org/x/net/http2.parseDataFrame from within go http2 library, for new structs per data frame read.

463.51MB 18.53% 71.73% 928.51MB 37.12% google.golang.org/grpc.encode from https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L295

456.01MB 18.23% 89.96% 456.01MB 18.23% github.com/golang/protobuf/proto.(*Buffer).enc_len_thing undetermined but within proto library.

214MB 8.56% 98.52% 214MB 8.56% google.golang.org/grpc/transport.(*http2Server).handleData from https://github.com/grpc/grpc-go/blob/master/transport/http2_server.go#L415

9MB 0.36% 98.88% 465.01MB 18.59% google.golang.org/grpc.protoCodec.Marshal from new slices in https://github.com/grpc/grpc-go/pull/1010/files#diff-365652330e79ea705fe1e6d0a561ee13R65 and https://github.com/grpc/grpc-go/pull/1010/files#diff-365652330e79ea705fe1e6d0a561ee13R73

But profile with this shows ~15% cum. CPU clock in transport.(*recvBufferReader).Read and 22% in write syscalls.

@tamird
Copy link
Contributor

tamird commented Dec 22, 2016

Without intending to sound pedantic, the source of improvement here is the removal of the item interface, more than the change from pointer to value. If you had kept the item interface and changed recvMsg to implement it by value, you would not have observed a change in memory allocations.

The commit message and PR description should both be amended to reflect the true source of improvement here.

EDIT: Apologies, the item interface was not removed - but recvMsg no longer implements it. With that addendum, the above is still relevant.

@apolcyn
Copy link
Contributor Author

apolcyn commented Dec 22, 2016

@tamird thanks but I think the removal of the interface effect ends up having the same benefit of copy-by-ref to copy-by-value.

I was thrown off by this as changing the pure recvMsg{} struct to implement item interface was tried first, but I when looking at the memory profiler, the same total allocation that used to be in transport.(*Stream).write was just switched to the go runtime function that converts structs to interfaces (runtime.convT2E I think it was). Looked into this and from what I found converting a go struct to an interface causes a heap alloc of the struct and puts its pointer in the new interface object, so the problem of passing a pointer to alloc'd object was still there.

@tamird
Copy link
Contributor

tamird commented Dec 22, 2016

Yes, that's correct, and that's my point - simply changing &recvMsg{} to recvMsg{} without also specializing the recvBuffer to use the concrete type rather than the interface does not improve memory allocation for the reason you mention.

The bottom line: this improvement is due to the specialization of recvBuffer from item (an interface) to recvMsg (a concrete type).

@apolcyn apolcyn changed the title put recvMsg onto buffer by value rather than reference change objects in recvBuffer queue from interface to concrete type to reduce allocs Dec 22, 2016
@apolcyn
Copy link
Contributor Author

apolcyn commented Dec 22, 2016

I see, updated title

@tamird
Copy link
Contributor

tamird commented Dec 22, 2016

Cool, thanks. Please update the commit message as well.

@apolcyn apolcyn force-pushed the remove_alloc_in_transport_write branch from 31ab052 to 0974ac3 Compare December 22, 2016 01:52
@apolcyn
Copy link
Contributor Author

apolcyn commented Dec 22, 2016

done, thanks!

@dfawley
Copy link
Member

dfawley commented May 19, 2017

@apolcyn what is the state of this PR?

@apolcyn apolcyn force-pushed the remove_alloc_in_transport_write branch 2 times, most recently from e0c0ea9 to 7c0c116 Compare May 19, 2017 23:21
@apolcyn apolcyn force-pushed the remove_alloc_in_transport_write branch from 7c0c116 to b9b4d17 Compare May 19, 2017 23:24
@apolcyn
Copy link
Contributor Author

apolcyn commented May 19, 2017

Just updated, I think this should still be good.

cc @mehrdada for how this incorporates your recent change to transport.go

@@ -105,18 +102,19 @@ func (b *recvBuffer) load() {
if len(b.backlog) > 0 {
select {
case b.c <- b.backlog[0]:
b.backlog[0] = nil
b.backlog[0].data = nil
Copy link
Member

@mehrdada mehrdada May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b.backlog[0] = recvMsg{} is more robust if we add fields to the struct in the future.

Also, do we ever close the channel in recvMsg?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, assuming perf is not worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mehrdada done for first comment, sorry for delay.

Also, do we ever close the channel in recvMsg?

I don't think we do explicitly close this channel, but I think it's ok for this channel, since the blocking read can unblock when the context is Done

Copy link
Member

@mehrdada mehrdada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@dfawley
Copy link
Member

dfawley commented Jun 26, 2017

I took your branch and rebased it (locally) and the tests still passed. Merging.

@dfawley dfawley merged commit eb7b130 into grpc:master Jun 26, 2017
@menghanl menghanl added 1.5 Type: Performance Performance improvements (CPU, network, memory, etc) labels Jul 18, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants