-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After intentional shutdown of a node of the cluster, the other nodes are still attempting to reconnect to the shutdown node #26319
Comments
This is also reproducible with 5.1.5. The problem with this behavior is that while the reconnection attempts are done, the cluster members basically become unresponsive, e.g. any calls to IMap.get() hang. Let's say we have cluster members A and B, as in the reproducer above. When member A shuts down, member B receives the shutdown request of A and does repartitioning, migrations, etc.: So far, this is as expected. However, right after that, this happens: This especially hurts in our production environment where we have a rolling update mechanism and aren't able to serve requests during this time window. |
Bug
I encountered this issue regarding a particular behavior in the Hazelcasts cluster setup. After intentionally shutting down one of the cluster nodes, I noticed that the remaining nodes received the following logs:
WARN […] com.hazelcast.internal.server.tcp.TcpServerConnectionErrorHandler - Removing connection to endpoint […] Cause => java.io.IOException {Connection refused to address /[…]}, Error-Count: 5
WARN […] com.hazelcast.internal.cluster.impl.MembershipManager - […] Member […] is suspected to be dead for reason: No connection
The remaining nodes are detecting the failure of the shut-down node. However, despite the intentional shutdown, the other nodes are still attempting to reconnect to the shut-down node.
I tested it on Hazelcast version 5.3.2, 5.0.2 and 4.0.3 and it always produces the same logs.
Expected behavior
I expect that when a node is intentionally shutdown, the other nodes do not attempt to reconnect to the shutdown node.
How to reproduce
I created a test to reproduce the error.
The text was updated successfully, but these errors were encountered: