New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Netty takes 5 mins 30 seconds to close the channel in case of timeout #13888
Comments
If the closeFuture takes 5 minutes to complete it means that closure was not detected for 5 minutes. I suspect you will need to implement some sort of "keep alive" messages in your protocol to detect "dead" connections fast. There is nothing that we can do here. |
Hi @normanmaurer, Could it be related to the event thread pool and the number of concurrent channels it can simultaneously handle? If yes, is there any configuration parameter that I can modify so that event thread pool can take less time to handle these concurrent channels? Also we do have "keep-alive" messages to detect dead connections in our protocol. In our scenario:
|
Maybe check if you block the event loop. |
Expected behavior
In our company, we use ODL Netconf library, which in turn uses IO Netty library, to connect to a NETCONF protocol based network element (NE).
Issue-1: NioSocketChannel.closeFuture() takes 5 mins 30 seconds to complete,
ODL Netconf library calls io.netty.bootstrap.Bootstrap.connect() to create a new NioSocketChannel to connect to the NE. After connecting to 400 NEs we lost connectivity to all of them. We periodically reattempted to connect to these NEs. New attempts timed out because these NEs were still not reachable.
If an attempt to connect to an NE times out, we await 60 seconds for its Channel.closeFuture() to complete indicating that all the resources were released and the Channel was closed. After that we reattempt to connect to the same NE. We expect Channel.closeFuture() to complete within 60 seconds.
Issue-2: With 400 NEs in comm-loss, channel for a new NE takes 2 mins 30 seconds to get created.
Further, while 400 NEs are in communication loss, we can discover a new NE into our software. Discovery process also calls io.netty.bootstrap.Bootstrap.connect() to create a new NioSocketChannel for the new NE. We expect NioSocketChannel to get created in 30 seconds for the new NE.
Actual behavior
Issue-1: NioSocketChannel.closeFuture() takes 5 mins 30 seconds to complete.
For 400 NEs in communication loss, we observed that the closeFuture took around 5 mins 30 seconds to complete. This delayed our next attempt to reconnect to the NE. Also when the network connectivity for comm-loss NE was restored, we had to wait for 5 mins 30 seconds to reconnect. This delayed our communication recovery process for the NE.
Issue-2: With 400 NEs in comm-loss, channel for a new NE takes 2 mins 30 seconds to get created.
While 400 NEs were in communication loss, when a new NE was discovered on our software, we observed that NioSocketChannel was created after 2 mins 30 seconds. Since we wait for 30 seconds, our code timed out and we were not able to connect to new NE.
We raised both these issue with ODL Netconf library team. They replied that their code is a wrapper around IO Netty code and it is the IO Netty library that is taking long time to close timed out NioSocketChannels or create new NioSocketChannels.
Since both these issues occur when 400 or more NEs are in communication loss, we believe they may both be related. So I've included both of them here. Can you please take a look into both these issues?
Steps to reproduce
Minimal yet complete reproducer code (or URL to code)
Our company code calls ODL Netconf library which in turn calls IO Netty library. Currently, I don't have access to ODL Netconf library code that calls io.netty.bootstrap.Bootstrap.connect(). If it is a mandatory requirement, then please let me know. I can request ODL Netconf library team to share their code snippets.
Netty version
4.1.104.Final
JVM version (e.g.
java -version
)openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
OS version (e.g.
uname -a
)CentOS Linux 7 (Core)
Linux rtxvdvlp405.net.local 3.10.0-1160.102.1.el7.x86_64 #1 SMP Tue Oct 17 15:42:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: