New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC Java service stops forwarding requests to handler and instead automatically cancels the request #11112
Comments
For bidi, if the interceptor sees the RPC but not the service handler, that means an interceptor is preventing it from getting to the handler. gRPC isn't really involved once the interceptors start running; there's some small amount of stub code, but it is just an adapter. For unary and server-streaming, the service handler is delayed until it gets the single message and half close. So that can explain certain cases of an interceptor seeing something and the handler not.
You probably want keepalive enabled. |
Specifically, I am using grpc-kotlin with stubs generated from my protos based on AbstractCoroutineServerImpl. This may one one for grpc-kotlin, but is it possible that some resource starvation would lead to my handler never being scheduled?
Keepalive is on, thank you! |
I can't speak to grpc-kotlin. I don't know how they handle the coroutines. I'd be surprised if it was all that different from normal Java for the initial call, though. |
Seems like we've answered all that we can. It seems best to ask grpc-kotlin folks if they know anything that could be impacting you. If it turns out we might be able to help you more, comment, and the issue can be reopened. |
For anyone who ends up here with a similar problem, we did hunt this down to a resource leak in the end. Running on a 2 CPU VM, we ended up with two threads busy looping and allocating resources, causing the GC to also run. With this, it appears that the thread pools were starved and the requests never went to the handler before the client gave up and cancelled. |
Ah, makes sense. Yes, if the |
What version of gRPC-Java are you using?
1.62.2
What is your environment?
Observed on deployment running on eclipse-temurin:21 as base image, deployed on GKE.
What did you expect to see?
I have a gRPC bidirectional streaming service that keeps the connection over for an extended period of time (10 minutes). This works as intended but after the server has been running for a while, it enters a state where I stop receiving inbound calls.
What did you see instead?
I have an interceptor that logs all requests and observe a call to interceptCall, but never see the log output from my actual method handler. Instead a while later I observe a call to the onCancel() callback of a call listener attached to the call.
I have other gRPC services (such as a health check) that do continue to operate correctly.
Steps to reproduce the bug
Unfortunately I do not have a clean repro but would appreciate any suggestions as to how to troubleshoot this further. I've only see this in production, and only after an instance has been running for an extended period of time. Once this "zombie" state is entered, the service does not recover until forcefully restarted.
The text was updated successfully, but these errors were encountered: