Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Best Practices #852

Open
gidxl03 opened this issue Sep 15, 2021 · 11 comments
Open

Performance Best Practices #852

gidxl03 opened this issue Sep 15, 2021 · 11 comments

Comments

@gidxl03
Copy link

gidxl03 commented Sep 15, 2021

The general section of https://grpc.io/docs/guides/performance/ led me to believe that to scale my application, I needed to use many gRPC channels and only one gRPC stream per channel between my long lived client and servers. This interpretation didn't match my understanding of a stream being a point-to-point path via 'sub-channels' from client to server and my understanding of a channel as a Y shaped collection of one sub-channel from client to each server. Nor does it match https://grpc.io/blog/grpc-on-http2/ which shows a single channel with many streams.

Streams, however, cannot .... They also might increase performance at a small scale but can reduce scalability due to load balancing and complexity, so they should only be used when they provide substantial performance

I now suspect that the author is recommending the use of a 'Unary RPC' API such as

service MyUnaryService {
  rpc myOperation(MyRequest) returns (MyResponse)
}

over a 'Streaming RPC' API where the both MyRequests and MyResponse are streamed

service MyStreamedService {
  rpc myOperation(**stream** MyRequest) returns (**stream** MyResponse)
}

I find this confusing because the paragraph is headed with "Use streaming RPCs" for long lived connections. My original interpretation was that I should use streaming RPCs" for long lived connections but avoid using more than one 'Stream' on my 'Streaming RPC'. I interpreted the word "Streams" (plural) to be more than one instance of class io.grpc.internal.Stream, such as that created from pseudo code

myStub = MyStreamedServiceGrpc.newStub(myChannel);
myStream = myStub.myOperation(myResponseStreamObserver);

which to my knowledge
a) causes the client to use a new Stream (I see that class:io.grpc.internal.ClientCallIimpl method:startInternal invokes class:clientStreamProvider method:newStream)
and from my own recent reading elsewhere ...
b) the client must normally ensure that it does not create more than 100 'myStream' per channel.

I currently cache just one instance of myStream and invoke many times myStream.onNext(myRequest) but I now suspect that I may not need to use a Streaming RPC API at all, but that if I do, I should cache up to N instances of myStream ... where N is the number of servers and should be < 100 and that I should do my own load-balancing so that the load is shared over each myStream.

Am wondering therefore if the documentation should be updated to disambiguate between a 'streaming RPC API' and a GRPC stream such as io.grpc.internal.ClientStream. E.g. text "They also might increase performance " could be replaced with "Streaming RPC API also might increase performance".

Something like the following text would have been clearer to me;
Use keepalive pings ...(as is)
Use Unary RPCs for short or long lived connections unless the data based passed in each RPC is large or performance results show that Streaming RPCs are necessary to give the application more control over the underlying stream.
Use Streaming RPCs for long lived connections and the client is capable of creating multiple streams and providing its own load-balancing so that each Server is appropriately loaded. In future, the client will not longer need to do this ....

@veblush
Copy link
Contributor

veblush commented Sep 22, 2021

Bidirectional streaming is the right one if you want to implement a chat-app. This is because it fits how messages are delivered. But it keeps using the same backend once it's established, you may want to take a different approach if you have a specific needs on load-balancing.

@chalin Could you take a look at ambiguities mentioned in the issue?

@ejona86 Could you answer how gRPC can be used in this case? (assuming gRPC-Java is being used here)

@ejona86
Copy link
Member

ejona86 commented Sep 22, 2021

The general section of https://grpc.io/docs/guides/performance/ led me to believe that to scale my application, I needed to use many gRPC channels and only one gRPC stream per channel between my long lived client and servers.

Why do you say that? I have to believe it is because of the paragraph starting:

(Special topic) Each gRPC channel uses 0 or more HTTP/2 connections and each connection usually has a limit on the number of concurrent streams.

However, it was saying you might have a performance issue and I don't see you providing any mention that you are having such a performance issue. Only use multiple channels when you can show it helps performance. (The specific times it helps get into lots of details on how you are load balancing, your networks, and other such details.)

This interpretation didn't match my understanding of a stream being a point-to-point path via 'sub-channels' from client to server and my understanding of a channel as a Y shaped collection of one sub-channel from client to each server.

That is a possible way channels work. That is what happens when using round_robin load balancing policy, and pretty much every load balancing policy except for pick_first. But it does depend on your deployment. It is rarer to need to "use multiple channels" when round_robin.

Nor does it match https://grpc.io/blog/grpc-on-http2/ which shows a single channel with many streams.

Nothing disagrees. Channels can have multiple connections and connections can have multiple streams. But it depends on your setup whether channels will have multiple connections and the limit of streams permitted on a connection.

I find this confusing because the paragraph is headed with "Use streaming RPCs" for long lived connections.

It said nothing about connection. Maybe you are using the word a bit imprecisely, but when we speak of connections we would be talking about TCP connections. The "Use streaming RPCs" sums up clearly with "Use streams to optimize the application, not gRPC." That is the heart of the matter.

I interpreted the word "Streams" (plural) to be more than one instance of class io.grpc.internal.Stream, such as that created from pseudo code

Streams in that context is just "streaming RPCs" essentially. It is saying "when to use the streaming feature."

and from my own recent reading elsewhere ...
b) the client must normally ensure that it does not create more than 100 'myStream' per channel.

This depends on your environment. The 100 RPC limit is seen pretty frequently on reverse proxies. But a backend can also set a limit of its own. A 100 RPC limit is common, but it isn't guaranteed. Backend-to-backend communication generally doesn't have that limit. It is really something specific to your deployment.

I currently cache just one instance of myStream and invoke many times myStream.onNext(myRequest) but I now suspect that I may not need to use a Streaming RPC API at all, but that if I do, I should cache up to N instances of myStream ... where N is the number of servers and should be < 100 and that I should do my own load-balancing so that the load is shared over each myStream.

No. This is exactly the opposite of what the "Use streaming RPCs" section discussed. You aren't helping your application at all by this complexity.

With just what you've described, I don't see anything wrong with using "normal" unary RPCs (no streaming).

Am wondering therefore if the documentation should be updated to disambiguate between a 'streaming RPC API' and a GRPC stream such as io.grpc.internal.ClientStream.

The documentation was not referencing an internal class of a specific implementation. It is internal and that documentation is cross-language.

@ejona86
Copy link
Member

ejona86 commented Sep 22, 2021

@veblush, chalin isn't officially maintaining the site any more. He can still show up and help out, but we need to be more self-service these days.

@ejona86
Copy link
Member

ejona86 commented Sep 22, 2021

@ananda1066 can take a look instead.

@chalin
Copy link
Collaborator

chalin commented Sep 22, 2021

@veblush, chalin isn't officially maintaining the site any more. He can still show up and help out, but we need to be more self-service these days.

Right, my contributions are mainly to infrastructure only. Thanks for pointing that out @ejona86.

@ananda1066
Copy link
Contributor

I agree with @ejona86. The tips are fairly general and will not apply to every case; we leave it up to the user to determine which practices improve performance for them and to tune these general practices to their case. This guide simply provides some general guidance to help the user determine the right trade-offs for their specific use case.

We don't use stronger language or define exactly when to use unary vs streaming RPCs because this again depends on the use case; we just offer the general performance benefits and drawbacks of using streaming RPCs.

@gidxl03
Copy link
Author

gidxl03 commented Sep 27, 2021

Hi all, many thanks for the quick response. Do please close the ticket if you don't see any merit in the suggested doc change. Here's a response to some questions asked by ejona86

The general section of https://grpc.io/docs/guides/performance/ led me to believe that to scale my application, I needed to use many gRPC channels and only one gRPC stream per channel between my long lived client and servers.

Why do you say that?

It was because of my incorrect interpretation of the word 'streams'.

I find this confusing because the paragraph is headed with "Use streaming RPCs" for long lived connections.

It said nothing about connection.

My bad, I should have quoted "Use streaming RPCs when handling a long-lived logical flow of data" .. and my use case involves long-lived server to server communication of 1000's HTTP/2 DATA frames per sec.

I interpreted the word "Streams" (plural) to be more than one instance of class io.grpc.internal.Stream, such as that created from pseudo code

Streams in that context is just "streaming RPCs" essentially. It is saying "when to use the streaming feature."

I think this is the heart of the matter and your answer in the above sentence provides the information about the meaning of the word "streams" that I did not glean from the current doc. FYI I referenced an internal class just to show what I had interpreted the doc meant by a "stream". I really thought the doc was saying not to use multiple streams unless perf-tests show that you need to. This is why I opened the ticket to disambiguate between a 'streaming RPC API' and a 'GRPC stream'

I currently cache just one instance of myStream and invoke many times myStream.onNext(myRequest) but I now suspect that I may not need to use a Streaming RPC API at all, but that if I do, I should cache up to N instances of myStream ... where N is the number of servers and should be < 100 and that I should do my own load-balancing so that the load is shared over each myStream.

No. This is exactly the opposite of what the "Use streaming RPCs" section discussed. You aren't helping your application at all by this complexity.

I intentionally didn't provide details of my use case because, as highlighted by ananda1066 , I understand that the doc has
very wide scope and is introductory in nature. Leaving the doc change behind, I can't resist going off topic and mentioning my use case because your comment "You aren't helping your application at all by this complexity." concerns me.
Use Case:

  • 4 grpc-java and grpc-c++ clients sending to 10 servers with lb-policy=round_robin.
  • each client sends 1000's application messages per sec (I understand that the app messages are sent as DATA frames)
  • there is a nginx proxy
  • we are seeing perf issues with lost data, currently debugging using channelz
  • my understanding is that we will have to deal with some complexity to support this use case, particularly because load balancing: support creating additional connections when MAX_CONCURRENT_STREAM is reached grpc#21386 is still open.
    We are seeing a lot of INTERNAL_ERROR which I now understand to be caused by our failing to limit the number of streams to SETTINGS_MAX_CONCURRENT_STREAMS as per https://datatracker.ietf.org/doc/html/rfc7540#section-6.5.2. Given that the 1-35.0-grpc-java API does not support getting max-concurrent-streams dynamically, we are using 'N' streams on a single channel where constant 'N' will be some number less than 100. We are currently implementing a simple algorithm to load-balance the clients' usage of the N streams. I would be grateful if you would advise whether or not I'm adding unnecessary complexity (doesn't seem too difficult).

@ejona86
Copy link
Member

ejona86 commented Oct 4, 2021

4 grpc-java and grpc-c++ clients sending to 10 servers with lb-policy=round_robin.

Since there are more servers than clients, using streaming will make it difficult to load balance as you've noticed. Given what you said before, it seems that there is no necessary ordering of requests, or any benefit to making sure certain requests go to the same backend of other requests. So unary is ideal in this situation.

We are seeing a lot of INTERNAL_ERROR...

Go ahead and file a bug in grpc-java repository and CC me. We'll discuss more there and I'll respond precisely. Many of the conclusions you have made are incorrect, but I can see how you may have arrived there. And MAX_CONCURRENT_STREAMS can maybe cause you to take alternative approaches. I don't see a strong reason for you to use streams and there are probably other workarounds, depending on the true source of issue.

@gidxl03
Copy link
Author

gidxl03 commented Oct 5, 2021

4 grpc-java and grpc-c++ clients sending to 10 servers with lb-policy=round_robin.

Since there are more servers than clients, using streaming will make it difficult to load balance as you've noticed. Given what you said before, it seems that there is no necessary ordering of requests, or any benefit to making sure certain requests go to the same backend of other requests. So unary is ideal in this situation.

I should have included the fact that in my use case, each client

  • has about 500 data-streams that need to be propagated via gRPC and ordering within those 500 data-streams should be preserved. There is no ordering of one data-stream vs another. Therefore I now realize that the Unary API was never an option for my use case.
  • is creating N streams over 1 channel so with 4 clients, the server/nginx will see 4 channels and 4xN streams.
  • the server is not limited to one thread per gRPC stream, we are delegating from the nio-netty thread directly to a custom executor that processes messages for each data-stream-id consecutively, everything else in parallel (FYI same executor is used in our Kafka consumer where a topic-partition is analogous to our usage of a long-lived gRPC stream where in both cases, ordering needs to be maintained on only a subset of the content).

We are seeing a lot of INTERNAL_ERROR...

Go ahead and file a bug in grpc-java repository and CC me. We'll discuss more there and I'll respond precisely. Many of the conclusions you have made are incorrect, but I can see how you may have arrived there. And MAX_CONCURRENT_STREAMS can maybe cause you to take alternative approaches. I don't see a strong reason for you to use streams and there are probably other workarounds, depending on the true source of issue.

Many thanks for the offer to investigate further, but I've no reason to believe there is a gRPC bug (just a documentation improvement). According to the HTTP/2 RFC, the INTERNAL_ERROR is expected when a client exceeds the server limit. I have assumed that enhancement 21386 is just a more sophisticated implementation of what I am doing, where the client manages a 'pool of streams' using round-robin/least-loaded etc and is aware of its channel's MAX_CONCURRENT_STREAMS limit, opens a replacement stream when the old stream closes in error with back-off-throttling.

But if you still think I should open a bug , can you give me a title so that I can fill in the details? Thanks again.

@ejona86
Copy link
Member

ejona86 commented Oct 5, 2021

has about 500 data-streams that need to be propagated via gRPC and ordering within those 500 data-streams should be preserved.

Okay, then yeah, streaming looks much more appropriate. In Java it wouldn't have been too hard to make a Channel implementation that wraps 5-10 ManagedChannels and round-robins over them for each newCall() (which is per-stream). But that's more invasive in C++. Your approach can be fine, assuming you limit the lifetime of the streams to re-balance over time.

According to the HTTP/2 RFC, the INTERNAL_ERROR is expected when a client exceeds the server limit.

But gRPC does not exceed the limit. Instead of exceeding the limit, gRPC queues locally. You can name the issue something like, "INTERNAL error when creating more streams that MAX_CONCURRENT_STREAMS", which is what you currently think is going on. The title doesn't matter too much though, as I can change it. Please provide the INTERNAL error message you see, as it has useful information.

@gidxl03
Copy link
Author

gidxl03 commented Oct 6, 2021

This explains why it took our C++ guy a bit longer to do the same wrapping of ManagedChannels :). Am very glad to hear that we are on track, in fact we already had a story in the backlog to 'limit the lifetime of the streams to re-balance over time' as the server side could scale up after the clients have established and they wouldn't get any work unless a new stream is created. This appears to be the most complex part, I should have added it to the list of items I'm expecting from enhancement 21386.

Finally, I understand now your thinking around the bug, and will start preparing the data. Here's an example where I got an INTERNAL_ERROR and it was during IDLE (easier to study). I know now that we have an AWS load-balancer between our clients and servers and the LB that has a 'hard limit' of 128 MAX_CONCURRENT_STREAMS meaning that the AWS support cannot increase that number. The LB was closing idle streams every minute, so on the server we set .permitKeepAliveTime(10, SEC) and .permitKeepAliveWithoutCalls(true) and on the client we set .keepAliveTime(30, SEC) and .keepAliveWithoutCalls(true). When the system is idle and with 'io.grpc.netty' logging at DEBUG level(neat work whoever did that!) the client still sees this on one of its two LB IP addresses;
OUTBOUND PING: ack=false bytes=1111
INBOUND PING: ack=true bytes=1111
--29.844 sec later ----
INBOUND RST_STREAM: streamId=9 errorCode=2
application logs 'RST_STREAM closed stream. HTTP/2 error code: INTERNAL_ERROR'

A minute +123ms later, a stream to the other LB IP closes;
OUTBOUND PING: ack=false bytes=1111
INBOUND PING: ack=true bytes=1111
INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[:status:502,... grpc-status: 14, grpc-message: unavailable]
INBOUND WINDOW_UPDATE: streamId=3 windowSizeIncrement=2147418111
application logs stream 'closed due to UNAVAILABLE: unavailable'
OUTBOUND RST_STREAM: streamId=3 errorCode=8

This cycle continues indefinitely. Am thinking that we should resolve this issue before studying issues under traffic, will keep digging and create that bug ticket as required. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants