gRPC Java service stops forwarding requests to handler and instead automatically cancels the request #11112

SoftMemes · 2024-04-18T09:19:33Z

What version of gRPC-Java are you using?

1.62.2

What is your environment?

Observed on deployment running on eclipse-temurin:21 as base image, deployed on GKE.

What did you expect to see?

I have a gRPC bidirectional streaming service that keeps the connection over for an extended period of time (10 minutes). This works as intended but after the server has been running for a while, it enters a state where I stop receiving inbound calls.

What did you see instead?

I have an interceptor that logs all requests and observe a call to interceptCall, but never see the log output from my actual method handler. Instead a while later I observe a call to the onCancel() callback of a call listener attached to the call.

I have other gRPC services (such as a health check) that do continue to operate correctly.

Steps to reproduce the bug

Unfortunately I do not have a clean repro but would appreciate any suggestions as to how to troubleshoot this further. I've only see this in production, and only after an instance has been running for an extended period of time. Once this "zombie" state is entered, the service does not recover until forcefully restarted.

ejona86 · 2024-04-19T15:12:38Z

I have an interceptor that logs all requests and observe a call to interceptCall, but never see the log output from my actual method handler.

For bidi, if the interceptor sees the RPC but not the service handler, that means an interceptor is preventing it from getting to the handler. gRPC isn't really involved once the interceptors start running; there's some small amount of stub code, but it is just an adapter.

For unary and server-streaming, the service handler is delayed until it gets the single message and half close. So that can explain certain cases of an interceptor seeing something and the handler not.

I have a gRPC bidirectional streaming service that keeps the connection over for an extended period of time (10 minutes).

You probably want keepalive enabled.

SoftMemes · 2024-04-20T10:58:42Z

For bidi, if the interceptor sees the RPC but not the service handler, that means an interceptor is preventing it from getting to the handler. gRPC isn't really involved once the interceptors start running; there's some small amount of stub code, but it is just an adapter.

Specifically, I am using grpc-kotlin with stubs generated from my protos based on AbstractCoroutineServerImpl. This may one one for grpc-kotlin, but is it possible that some resource starvation would lead to my handler never being scheduled?

You probably want keepalive enabled.

Keepalive is on, thank you!

ejona86 · 2024-04-23T20:02:11Z

I can't speak to grpc-kotlin. I don't know how they handle the coroutines. I'd be surprised if it was all that different from normal Java for the initial call, though.

ejona86 · 2024-05-02T16:49:36Z

Seems like we've answered all that we can. It seems best to ask grpc-kotlin folks if they know anything that could be impacting you. If it turns out we might be able to help you more, comment, and the issue can be reopened.

SoftMemes · 2024-05-03T08:46:00Z

For anyone who ends up here with a similar problem, we did hunt this down to a resource leak in the end.

Running on a 2 CPU VM, we ended up with two threads busy looping and allocating resources, causing the GC to also run. With this, it appears that the thread pools were starved and the requests never went to the handler before the client gave up and cancelled.

ejona86 · 2024-05-03T14:19:01Z

Ah, makes sense. Yes, if the serverBuilder.executor() or channelBuilder.executor() thread pools are exhausted, RPC events would be delayed.

sergiitk added the Waiting on reporter there was a request for more information without a response or answer or advice has been provided label Apr 25, 2024

ejona86 closed this as completed May 2, 2024

ejona86 removed the Waiting on reporter there was a request for more information without a response or answer or advice has been provided label May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gRPC Java service stops forwarding requests to handler and instead automatically cancels the request #11112

gRPC Java service stops forwarding requests to handler and instead automatically cancels the request #11112

SoftMemes commented Apr 18, 2024

ejona86 commented Apr 19, 2024

SoftMemes commented Apr 20, 2024

ejona86 commented Apr 23, 2024

ejona86 commented May 2, 2024

SoftMemes commented May 3, 2024

ejona86 commented May 3, 2024

gRPC Java service stops forwarding requests to handler and instead automatically cancels the request #11112

gRPC Java service stops forwarding requests to handler and instead automatically cancels the request #11112

Comments

SoftMemes commented Apr 18, 2024

What version of gRPC-Java are you using?

What is your environment?

What did you expect to see?

What did you see instead?

Steps to reproduce the bug

ejona86 commented Apr 19, 2024

SoftMemes commented Apr 20, 2024

ejona86 commented Apr 23, 2024

ejona86 commented May 2, 2024

SoftMemes commented May 3, 2024

ejona86 commented May 3, 2024