fix(WebSocketShard): Add RESUMED and READY timeout #8759

gamer0mega · 2022-10-16T15:05:44Z

Please describe the changes this PR makes and why it should be merged:
This PR should solve the issue #8486, where the connection gets stuck on IDENTIFY or RESUME.

It adds new private properties of WebSocketShard: readyDispatchTimeout and resumedDispatchTimeout, as well as 2 more private methods: setReadyDispatchTimeout and setResumedDispatchTimeout.

It also removes one .unref() of a timeout as the process exited on it if that was the only shard.

I agree that the names could be confusing (readyTimeout and readyDispatchTimeout), feel free to change them before merging.

I ran a bot with 4 shards for 2 days and all the shards are still up and working fine, whereas before they stopped reconnecting after a few hours of running.

Status and versioning classification:

Code changes have been tested against the Discord API, or there are no code changes
I know how to update typings and have done so, or typings don't need updating
This PR changes the library's interface (methods or parameters added)

vercel · 2022-10-16T15:05:49Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Ignored Deployments

Name	Status	Preview	Comments	Updated
discord-js	⬜️ Ignored (Inspect)			Jan 7, 2023 at 5:21PM (UTC)
discord-js-guide	⬜️ Ignored (Inspect)			Jan 7, 2023 at 5:21PM (UTC)

packages/discord.js/src/client/websocket/WebSocketShard.js

Makes RESUMED timeout delay itself if there were events received in the last 20 seconds

Stale

l1v0n1

п

DraftProducts · 2022-12-26T22:28:05Z

This stuff looks great!
What did you test it on? @gamer0mega
I realized that the issue can be increased when the network speed is low or not proportional to the amount of servers supported.
Did you compare the session start tokens used? If so, is there a significant difference?

I will test this fix on production 400 shards to see if it reduces or fixes the problem on my side 👍

packages/discord.js/src/client/websocket/WebSocketShard.js

gamer0mega · 2022-12-27T10:27:56Z

This stuff looks great!
What did you test it on? @gamer0mega
I realized that the issue can be increased when the network speed is low or not proportional to the amount of servers supported.
Did you compare the session start tokens used? If so, is there a significant difference?

I will test this fix on production 400 shards to see if it reduces or fixes the problem on my side 👍

I didn't test it on a large bot, because it was a rewrite of me moving my main bot from eris as it is not supported anymore. I tested it on a combination of mobile data + mobile hotspot + an unstable Linux wi-fi driver, with 4 shards. This is very unstable and I found this issue quickly as the test bot randomly died and never connected back, I had 4 shards spread across 2 processes to make sure the fix worked, I always looked through the debug logs and seeing that it got READY was rare, most of the times it got RESUMED and after 10 days all the shards were up and working before I had to restart my pc. You can check the session starts yourself, because before it died on the second or even the first reconnect and I was unable to check that at all

DraftProducts · 2022-12-27T12:35:40Z

Ok, good job 🔥
I added this patch in production along with the patch made in collaboration with @legendhimself.
After +10 hours, I don't seem to have had any other issues caused by this RP.

Edit: #9001 has been done for that 🎉

Also, what do you think about this fix 2e1c68e ?

It had for me a good impact on zombie connections (but it increased a lot the new logins), if we read the documenation of the > discord api : https://discord.com/developers/docs/topics/gateway#sending-heartbeats

We should change the reset value to false on L607 to allow a RESUME on a zombie connection.

Co-authored-by: DraftMan <contact@draftman.fr>

legendhimself · 2023-01-13T14:37:17Z

I don't think we need this pr anymore since #8989 has fixed #8486
Correct me if I am wrong but we don't need the READY timeout since we already have a timeout on the ShardingManager

Moreover, we shouldn't be handling resumed and ready event timeouts by ourselves since if any connection goes dead it's going to be picked up by zombie connection code in a few seconds closeTimeout

gamer0mega · 2023-01-14T22:14:06Z

I don't think we need this pr anymore since #8989 has fixed #8486 Correct me if I am wrong but we don't need the READY timeout since we already have a timeout on the ShardingManager

Moreover, we shouldn't be handling resumed and ready event timeouts by ourselves since if any connection goes dead it's going to be picked up by zombie connection code in a few seconds closeTimeout

It can get stuck on RESUMING, where it should never heartbeat until RESUMED, so if it gets stuck there is no way to detect it

Qjuh · 2023-01-15T08:07:05Z

I don't think we need this pr anymore since #8989 has fixed #8486 Correct me if I am wrong but we don't need the READY timeout since we already have a timeout on the ShardingManager
Moreover, we shouldn't be handling resumed and ready event timeouts by ourselves since if any connection goes dead it's going to be picked up by zombie connection code in a few seconds closeTimeout

It can get stuck on RESUMING, where it should never heartbeat until RESUMED, so if it gets stuck there is no way to detect it

Can it really get stuck there though? None of the errors in the issue you linked had that as their cause, all were related to reconnecting over and over and then getting stuck because of that.

legendhimself · 2023-01-15T12:32:58Z

It can get stuck on RESUMING, where it should never heartbeat until RESUMED, so if it gets stuck there is no way to detect it

@gamer0mega If it gets stuck on resuming you need a better host for the bot. This is highly unlikely unless the host has a bad internet connection. Resuming -> Resumed depends on Our Server to Discord Server, if the resumed didn't happen even with a good connection (which is highly unlikely) its a discord issue.

gamer0mega · 2023-01-15T12:34:21Z

It can get stuck on RESUMING, where it should never heartbeat until RESUMED, so if it gets stuck there is no way to detect it

@gamer0mega If it gets stuck on resuming you need a better host for the bot. This is highly unlikely unless the host has a bad internet connection. Resuming -> Resumed depends on Our Server to Discord Server, if the resumed didn't happen even with a good connection (which is highly unlikely) its a discord issue.

It is not a problem with the network, the socket gets disconnected, the library doesnt know that and keeps RESUMING for eternity

legendhimself · 2023-01-15T12:44:52Z

@gamer0mega
Can you provide me with a repro
This repro should work even with a good connection as you just mentioned "It is not a problem with the network"

gamer0mega · 2023-01-30T10:24:51Z

@gamer0mega
Can you provide me with a repro
This repro should work even with a good connection as you just mentioned "It is not a problem with the network"

By that, I meant that the library should reconnect even if my networking sucks and not get stuck

P.S. You disagree? That's funny then, it's impossible to have a hosting which has no network issues, eventually your network can go down for a few minutes, and if it doesn't reconnect then some shards will be down

Jiralite · 2023-03-18T15:44:49Z

Superseded by #9099.

gamer0mega added 2 commits October 16, 2022 17:49

fix(WebSocketShard): add RESUMED and READY timeout

cf2d38c

fix(WebSocketShard): Add RESUMED and READY timeout

3a8c8ea

github-actions bot added the packages:discord.js label Oct 16, 2022

github-actions bot requested review from iCrawl, kyranet, SpaceEEC and vladfrangu October 16, 2022 15:05

Jiralite added gateway semver:patch labels Oct 16, 2022

Jiralite added this to the discord.js v14.7 milestone Oct 16, 2022

fix(typings): Bring back a semicolon

fc8ac70

gamer0mega force-pushed the main branch from bbc6b25 to fc8ac70 Compare October 16, 2022 15:56

gamer0mega added 2 commits October 17, 2022 01:30

Merge branch 'main' into main

403fb54

Merge branch 'discordjs:main' into main

7d9e6f2

vercel bot deployed to Preview – discord-js October 19, 2022 22:41 View deployment

SpaceEEC approved these changes Oct 23, 2022

View reviewed changes

kyranet previously requested changes Oct 27, 2022

View reviewed changes

packages/discord.js/src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved

packages/discord.js/src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved

gamer0mega added 4 commits October 28, 2022 20:37

fix(WebSocketShard): Remove .unref() of timeouts

03285ec

fix(WebSocketShard): Properly handle RESUMED timeout

919cab7

Makes RESUMED timeout delay itself if there were events received in the last 20 seconds

fix(WebSocketShard): Remove .unref() of timeouts

4b6a433

fix(WebSocketShard): Properly handle RESUMED timeout

28e6851

Makes RESUMED timeout delay itself if there were events received in the last 20 seconds

gamer0mega requested review from kyranet and removed request for vladfrangu and iCrawl November 20, 2022 11:51

iCrawl requested a review from vladfrangu November 25, 2022 17:36

iCrawl approved these changes Nov 25, 2022

View reviewed changes

Merge branch 'main' into main

95810cf

vercel bot deployed to Preview – discord-js November 25, 2022 17:40 View deployment

vercel bot deployed to Preview – discord-js-guide November 25, 2022 17:40 View deployment

kyranet modified the milestones: discord.js v14.7, discord.js v14.8 Nov 29, 2022

l1v0n1 reviewed Dec 4, 2022

View reviewed changes

DraftProducts suggested changes Dec 27, 2022

View reviewed changes

packages/discord.js/src/client/websocket/WebSocketShard.js Outdated Show resolved Hide resolved

Update packages/discord.js/src/client/websocket/WebSocketShard.js

96307e0

Co-authored-by: DraftMan <contact@draftman.fr>

DraftProducts approved these changes Jan 7, 2023

View reviewed changes

Jiralite modified the milestones: discord.js v14.8, discord.js 14.9 Feb 17, 2023

Jiralite closed this Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(WebSocketShard): Add RESUMED and READY timeout #8759

fix(WebSocketShard): Add RESUMED and READY timeout #8759

gamer0mega commented Oct 16, 2022

vercel bot commented Oct 16, 2022 •

edited

l1v0n1 left a comment

DraftProducts commented Dec 26, 2022

gamer0mega commented Dec 27, 2022

DraftProducts commented Dec 27, 2022 •

edited

legendhimself commented Jan 13, 2023 •

edited

gamer0mega commented Jan 14, 2023

Qjuh commented Jan 15, 2023

legendhimself commented Jan 15, 2023

gamer0mega commented Jan 15, 2023

legendhimself commented Jan 15, 2023

gamer0mega commented Jan 30, 2023 •

edited

Jiralite commented Mar 18, 2023

fix(WebSocketShard): Add RESUMED and READY timeout #8759

fix(WebSocketShard): Add RESUMED and READY timeout #8759

Conversation

gamer0mega commented Oct 16, 2022

vercel bot commented Oct 16, 2022 • edited

l1v0n1 left a comment

Choose a reason for hiding this comment

DraftProducts commented Dec 26, 2022

gamer0mega commented Dec 27, 2022

DraftProducts commented Dec 27, 2022 • edited

legendhimself commented Jan 13, 2023 • edited

gamer0mega commented Jan 14, 2023

Qjuh commented Jan 15, 2023

legendhimself commented Jan 15, 2023

gamer0mega commented Jan 15, 2023

legendhimself commented Jan 15, 2023

gamer0mega commented Jan 30, 2023 • edited

Jiralite commented Mar 18, 2023

vercel bot commented Oct 16, 2022 •

edited

DraftProducts commented Dec 27, 2022 •

edited

legendhimself commented Jan 13, 2023 •

edited

gamer0mega commented Jan 30, 2023 •

edited