New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in server transport with multiple ports #6641
Comments
Providing more details:
#6614 attempted a fix w/o moving multi-port to nettyserver but concluded to move forward with multi-port netty. |
It looks so, sorry for the confusion. |
The problem was that if the await() does not happen in the eventloop, then it supposed not to cause deadlock.
Yup, and also loop over to in the nettyserver shutdown path as well if we create multiple port on the bootstrap. |
ProblemSet some context to make sure we understand it correctly:
Server deadlock can be reproduced when we force the eventloop to be the same, use multiple ports, and prolong the ServerImpl start between two nettyServer.start(), e.g.
The server will hang there, with only one of the port successfully bind, and client won’t be able to establish connection, error:
SolutionsJust moving multi-port to nettyServer layer won’t help with the issue, because To prevent the deadlock issue, we may have a these options:
|
The first thing I want to do is to remove support for multiple InternalServers from ServerImpl and have that be handled by NettyServer. (I'm not bothered by multiple InternalServers in general, but multiple during a single server.start() call is a problem. Later, I expect to allow multiple, but with each having a separate start() invocation.) This is so that errors during bind can be properly cleaned up, which is not possible today. (Either we leak a port, or the user can't call (1) wouldn't be that bad and would make sense if ts.start() was able to be aborted. In principal that is a good idea, but we'd need to look at the complexity it brings. If For (2), it sounds like it would just change this line to be grpc-java/netty/src/main/java/io/grpc/netty/NettyServer.java Lines 229 to 233 in 3811ef3
That sounds fair, but we'd like to avoid it when we can. I'll note for the single-port case, when that callback fires I'll also note that it looks like this assignment is not thread-safe:
If NettyServer is handling the multiple ports, then we can do all the binds as one "atomic" operation on the event loop. This depends on a behavior of
Let's just start by moving the multi-port logic into NettyServer, though. |
I'll note that in certain advanced fd-passing schemes, it is useful to pass an already-bound server socket fd. In those schemes, the fd already exists and listen() has been called and may have clients already making connections to it. The only thing missing is some code to accept() connections. In those approaches it is best that we don't drop connections when we start. Now, that sort of thing doesn't happen much with Java, but still. Systemd socket activation is the most clearly defined usage of this, but it is essentially the same principal used for "hot restarts" of servers (e.g., Envoy). It is also useful when spawning subprocesses that communicate over unix domain sockets. |
#7674 fixed issue. |
ServerImpl.start()
callsNettyServer.start()
while holdingServerImpl.lock
.NettyServer.start()
awaits a submitted runnable in eventloop. However, this pending runnable may never be executed because the eventloop might be executing some other task, , likeServerListenerImpl.transportCreated()
, that is trying to acquireServerImpl.lock
causing a deadlock.This is a deadlock for multiple-port server transport usecase with the same deadlock mechanism as #6601.
The text was updated successfully, but these errors were encountered: