-
-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR: Adding ping/pong redundancy #1414
Comments
When you say:
I assume that you mean that the data exchange from client to server continues unabated while a ping from server to client goes missing. Indeed, WebSockets runs over TCP, with strong ordering guarantees, making it impossible for the two effects that you describe to happen on the same half of the connection. I suppose it's "client to server" and "server to client" in that order because the situation you're describing happens when:
In this scenario, even if the server noticed and closed the connection, the client still wouldn't notice, as we assume that connectivity is lost! The only way to fix that is to run heartbeats in the client. As you may know, browsers do NOT expose an API for protocol-level ping/pongs. As a consequence, you have no choice but to do it at the application level. This is the real reason why many services have to design application-level heartbeats. If I guessed your scenario accurately, there's nothing we can do to fix it on the server side. If your scenario is different and there's a chance we can do something on the server side, please explain why, considering the guarantees of TCP, your proposed solution changes something. The way you formulated it sounds like you're proposing a shitty version of TCP because you don't realize that TCP already handles retries for you. If TCP doesn't succeed, we will not succeed :-) |
Thanks for the comprehensive response. It sounds like you have diagnosed my issue pretty accurately. Indeed, I did not realise that TCP has retransmission logic built into it which removes the need for the feature. However, I am still not fully understanding how a server -> client ping can be dropped but client -> server data can continue to be streamed... It seems I need to do some reading into the fundamentals of the TCP protocol to fully understand the bidirectional nature of the connection. It is sounding like either I need to trust the existing heartbeat mechanism to diagnose lost connections or implement application layer heartbeats. Given some of my clients are browser based, this might end up being the solution. Thanks for your input 👍 |
Here's a neat StackOverflow answer from 7 years ago that says exactly what I said above: https://stackoverflow.com/questions/35820885/why-do-many-websocket-libraries-implement-their-own-application-level-heartbeats And here's a great blog post that goes into the details of a real life encounter with this issue: https://making.close.com/posts/reliable-websockets/ This blog post answers your question of "how do I create a half-broken TCP connection"? It has to do with the two-way closing handshake in TCP: if you break it at the wrong point, the connection can remain in a non-functional state and timeouts are very long. I should add this to the discussion of heartbeats and/or the FAQ. Marking as a doc issue. |
As always, thanks @aaugustin . I learned, and I keep learning from your writings. 🙏 Sorry the noise. 👍 |
I've been on a bit of a journey to get here (heres the origin).
I am writing a server application which receives client connections which can on occasion be from poor network environments. I have observed sometimes that the websocket ping/pong mechanism (heartbeat) is not fully reliable with regards to the status of the connection. Specifically, sometimes a ping will go unacked with a requisite pong in poor network conditions, possibly due to the client experiencing packet loss and never receiving the ping.
The consequences of missing solely the pong means that whilst the heartbeat has been missed, the rest of data exchange continues unabated until the ping timeout is violated some seconds later, abruptly ending the exchange despite an otherwise healthy connection. Unfortunately due to the requirements of my application, reconnection is prohibited and the process will need to initialise a new instance. This means that dropping a connection prematurely is costly.
Currently it seems the heartbeat logic in this library has no functionality to add mechanisms seen in other similar connection types such as a retry or backoff whereby if a heartbeat is missed, the server immediately tries again for X number of times and if those fail then the connection is closed.
Would adding such functionality to this library (disabled by default of course) violate other websocket protocols or assumed logic? It seems to me that adding this logic would solve a few edge use cases whilst not explicitly breaking existing uses and add some redundancy in these poor network scenarios.
I have read other users running into similar problems which they mitigate by adding their own custom heartbeat on the application layer but this seems wrong due to the duplication of existing fundamental patterns and the chance for a deviation of between the protocol level and the application level (e.g application reporting connection is valid whilst protocol reports the connection has timed out).
Let me know your thoughts!
The text was updated successfully, but these errors were encountered: