New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
netty: fix a race for channelz at server transport creation #6610
Conversation
c03404c
to
98b11bd
Compare
98b11bd
to
a4077bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you briefly describe the race. At least the components racing would be very helpful within the commit message. Yes, someone can look at the issue for more info, but honestly right now I don't see a deadlock nor race that isn't already a problem (for example, the future.await() after the bind() can deadlock if called from a Netty thread).
throw new RuntimeException("Interrupted while registering listen socket to channelz", ex); | ||
} | ||
|
||
InternalInstrumented<SocketStats> listenSocket = new ListenSocket(channel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a comment here that this requires the channel to have completed bind()ing.
Okay. I'm seeing now there is a loop with ServerImpl.lock from start() and transportCreated(). I think the lock in start() is to avoid callbacks from being seen too early. Hmm... and it is also used for things like getPort and shutdown(), although honestly we don't really support calling shutdown/getPort before start returns, so we could potentially call ts.start() after releasing the lock within ServerImpl.start() |
Socket socket = new Socket(); | ||
socket.connect(ns.getListenSocketAddress(), /* timeout= */ 8000); | ||
socket.close(); | ||
countDownLatch.await(); // SocketStats won't be available until binding complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binding is complete when start() returns, so I don't think this comment is accurate.
assertThat(((InetSocketAddress) ns.getListenSocketAddress()).getPort()).isGreaterThan(0); | ||
|
||
Socket socket = new Socket(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but is a roundabout way of making sure the event loop has completed the runnable. Would it be better to Create a single EventLoop and pass that as the boss+worker EventLoopGroup to the NettyServer constructor? Then you can submit() your own Runnable and await its completion. We could use that same EventLoop for all the tests, really, if you'd prefer a @BeforeClass
/@AfterClass
to manage its lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This roundabout is just copied from the test above it. Actually ns.getListenSocketStats()
is not required to be called in the event loop. I'm not sure submit() is the most correct way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one above is actually checking the worker channel options. So creating a connection appropriate and necessary there.
I agree ns.getListenSocketStats()
isn't required to be done in the event loop, but I wasn't suggesting to call it from the event loop. I was suggesting:
eventLoop.submit(new Runnable() {
@Override public void run() {}
}).await();
InternalInstrumented<SocketStats> listenSocket = ns.getListenSocketStats();
It is just a synchronization barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with fixed event loop pool.
A race condition was reported by user in #6601:
ServerImpl.start()
callsNettyServer.start()
while holdingServerImpl.lock
.NettyServer.start()
awaits a submitted runnable in eventloop. However, this pending runnable may never be executed because the eventloop might be executing some other task, likeServerListenerImpl.transportCreated()
, that is trying to acquireServerImpl.lock
causing a deadlock.This PR resolves the particular issue reported in #6601 for server with a single port, but
NettyServer
(https://github.com/grpc/grpc-java/blob/v1.26.0/netty/src/main/java/io/grpc/netty/NettyServer.java#L251) andServerImpl
(https://github.com/grpc/grpc-java/blob/v1.26.0/core/src/main/java/io/grpc/internal/ServerImpl.java#L184) in general still have the same potential risk of deadlock, which need further fix.