New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on start gRPC server #6601
Comments
Thank you @avgustinmm for the full thread dump. It seems it's a valid race. |
dapengzhang0
added a commit
that referenced
this issue
Jan 16, 2020
A race condition was reported by user in #6601: `ServerImpl.start()` calls `NettyServer.start()` while holding `ServerImpl.lock`. `NettyServer.start()` awaits a submitted runnable in eventloop. However, this pending runnable may never be executed because the eventloop might be executing some other task, like `ServerListenerImpl.transportCreated()`, that is trying to acquire `ServerImpl.lock` causing a deadlock. This PR resolves the particular issue reported in #6601 for server with a single port, but `NettyServer` (https://github.com/grpc/grpc-java/blob/v1.26.0/netty/src/main/java/io/grpc/netty/NettyServer.java#L251) and `ServerImpl` (https://github.com/grpc/grpc-java/blob/v1.26.0/core/src/main/java/io/grpc/internal/ServerImpl.java#L184) in general still have the same potential risk of deadlock, which need further fix.
dapengzhang0
added a commit
to dapengzhang0/grpc-java
that referenced
this issue
Jan 22, 2020
A race condition was reported by user in grpc#6601: `ServerImpl.start()` calls `NettyServer.start()` while holding `ServerImpl.lock`. `NettyServer.start()` awaits a submitted runnable in eventloop. However, this pending runnable may never be executed because the eventloop might be executing some other task, like `ServerListenerImpl.transportCreated()`, that is trying to acquire `ServerImpl.lock` causing a deadlock. This PR resolves the particular issue reported in grpc#6601 for server with a single port, but `NettyServer` (https://github.com/grpc/grpc-java/blob/v1.26.0/netty/src/main/java/io/grpc/netty/NettyServer.java#L251) and `ServerImpl` (https://github.com/grpc/grpc-java/blob/v1.26.0/core/src/main/java/io/grpc/internal/ServerImpl.java#L184) in general still have the same potential risk of deadlock, which need further fix.
dapengzhang0
added a commit
that referenced
this issue
Jan 22, 2020
A race condition was reported by user in #6601: `ServerImpl.start()` calls `NettyServer.start()` while holding `ServerImpl.lock`. `NettyServer.start()` awaits a submitted runnable in eventloop. However, this pending runnable may never be executed because the eventloop might be executing some other task, like `ServerListenerImpl.transportCreated()`, that is trying to acquire `ServerImpl.lock` causing a deadlock. This PR resolves the particular issue reported in #6601 for server with a single port, but `NettyServer` (https://github.com/grpc/grpc-java/blob/v1.26.0/netty/src/main/java/io/grpc/netty/NettyServer.java#L251) and `ServerImpl` (https://github.com/grpc/grpc-java/blob/v1.26.0/core/src/main/java/io/grpc/internal/ServerImpl.java#L184) in general still have the same potential risk of deadlock, which need further fix.
4 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
What version of gRPC-Java are you using?
1.26.0
What is your environment?
Linux, Alpine
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (IcedTea 3.8.0) (Alpine 8.171.11-r0)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
Netty - 4.1.44.Final
Vertx & Vertx-gRPC - 3.8.4
What did you expect to see?
Proper start of gRPC Server
What did you see instead?
Start sometimes hangs with deadlock
Steps to reproduce the bug
int_threaddump.txt
I suppose it's a race condition related to synchronization in gRPC (ServerImpl), await in NetServer.start and vertx/nettty event loops (probably single threaded). Probably it could happened at any time if someone start gRPC server and concurrently open new client connection to that server.
In my case I stopped and started the gGPR server but I'm not sure if it is somehow related.
Analysis
What I see in the thread dump is the following 2 threads that stays in that state, seems, forever:
"vert.x-eventloop-thread-0" #39 prio=10 os_prio=0 tid=0x000055711e379000 nid=0x2d waiting for monitor entry [0x00007fb72abc8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at io.grpc.internal.ServerImpl$ServerListenerImpl.transportCreated(ServerImpl.java:379)
- waiting to lock <0x00000000c559f1a0> (a java.lang.Object)
at io.grpc.netty.NettyServer$1.initChannel(NettyServer.java:224)
- locked <0x00000000c559bfd8> (a io.grpc.netty.NettyServer)
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
"vert.x-worker-thread-12" #214 prio=10 os_prio=0 tid=0x000055711f2d1800 nid=0x418 in Object.wait() [0x00007fb720656000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:252)
- locked <0x00000000c589baa0> (a io.netty.util.concurrent.PromiseTask)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:35)
at io.grpc.netty.NettyServer.start(NettyServer.java:269)
at io.grpc.internal.ServerImpl.start(ServerImpl.java:184)
- locked <0x00000000c559f1a0> (a java.lang.Object)
at io.grpc.internal.ServerImpl.start(ServerImpl.java:90)
From what I see in these thread dumps and the code I think that this could be the problem (deadlock):
I'm attaching file with thread dumps of the whole JVM
The text was updated successfully, but these errors were encountered: