Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(websocketshard): deal with zombie connection caused by 4009 #7581
fix(websocketshard): deal with zombie connection caused by 4009 #7581
Changes from 2 commits
a2326a7
aab64b8
25ac5ea
4847898
52b0abd
ac2beb3
2cb93a6
792f38d
78f84f9
b1a061f
e80a867
78b15bc
c154ceb
47b9587
d3dfbac
fb3ab02
edd617d
130a62f
043fe3e
196ec16
f26a03d
60f6425
9ec75ff
a69dd35
8493928
1856ac2
170edfa
d4cbd64
c05a062
c023b18
37759f0
8dbb5b0
836d934
2f4a7e6
f0770f8
b2fd715
9b5f5f6
b57f27f
97a2e5f
7194f49
f74985a
7453895
a3b94b2
7a97d2f
4a9e393
0daf304
5877246
7b108a4
be04819
88fb5f6
931fac5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 400 ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after several days of testing, I found out that it takes around 200-400ms to close the ws.
👆 The first date here is logged just before closing, the second one is logged in onClose method.
The difference of these 2 dates is 315ms. This why I kept it 400ms [max]
another example 👇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's with your internet (or rather, with your provider's internet). It's highly dependant on location and how fast the packets are routed to Discord and back, so yeah, this is an unreliable thing.
If anything, you should use
events.once
to wait for a closing event.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, We can make it a option "6000"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure what to feel about this, since it's an object managed by a third-party package, and also a private property.
Also, typo on
destory
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no proper answer from ws package on "why" readyState is stuck at
CLOSING
forever. I just had to directly destroy the socket for them and emit the event manually. Even tho the code forws.terminate()
is pretty much similar.All these changes and codes are in production right now and I have seen 0 downtimes. except a few 4009 where some session cannot be resumed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more referring to the fact that because it's private, the creators of the
ws
package may change it without prior notice, e.g. making the field a true-private one.In case that happens, I believe we should be prepared for it.
wdyt @vladfrangu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo, the proper fix for a 4009 is to not care anymore about the connection. Remove all listeners, terminate the connection and start a-new, no waiting or anything. If we get in a state where we emit a 4009, no point to wait for something that won't happen 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as far I have seen with my bot, There are many times where the bot resumes the session after a 4009 with this pr changes ie 0 downtimes, but if we do what you said, > Remove all listeners, terminate the connection, and start a-new.
I think this will cause the bot to have tiny downtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll always have downtime, what matters is trying to resume afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should directly clear everything and make a new connection for 4009 without even trying to resume the session even once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also want to ask if anyone knows why only one of my shards has this issue? and it[4009] only appears on it. The rest of my 15 shards are fine.
If it was my connection shouldn't it be random and appear on random shards?
@kyranet @vladfrangu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I said you should clean the ws listeners (this.clearConnectionListeners or w/e its called), call this.connection.terminate() and simply not care if it works or doesn't and re-schedule the shard to connect. Thus, you get to attempt a resume, if it works 🎉, if it doesn't oh well, identify