Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(WebSocketShard): Zombie connection fix #8989

Merged
merged 2 commits into from Jan 1, 2023

Conversation

legendhimself
Copy link
Contributor

@legendhimself legendhimself commented Dec 28, 2022

Fixes #8486, fixes #8984 and supersedes #8981

Please describe the changes this PR makes and why it should be merged:

Status and versioning classification:

  • Code changes have been tested against the Discord API, or there are no code changes
  • I know how to update typings and have done so, or typings don't need updating

@vercel
Copy link

vercel bot commented Dec 28, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Ignored Deployments
Name Status Preview Comments Updated
discord-js ⬜️ Ignored (Inspect) Dec 31, 2022 at 4:02AM (UTC)
discord-js-guide ⬜️ Ignored (Inspect) Dec 31, 2022 at 4:02AM (UTC)

@legendhimself
Copy link
Contributor Author

legendhimself commented Dec 29, 2022

Previous Report

image

Before the uptime was kind of bad thanks to the unref that we had on the WsCloseTimeout,

Current Report

image

The logs/tests proved reduced reconnects and better uptime at least on my 60+ Shards bot.
@DraftProducts Also has tested the changes on his bot with 350+ shards which had frequent zombie connections and most of the time ran out of logins per day. He will write more details soon on this pr.

Also thanks @shoxcy for providing me with shawarma during the testings

@kyranet
Copy link
Member

kyranet commented Dec 29, 2022

Blocking for 2 days at @legendhimself's request so they can test the PR for more time. Will unblock afterwards.

@DraftProducts
Copy link
Contributor

For now, I have excellent results.
Previously I was loosing multiple starts bucket tokens each 5 mins that brought me to multiple bot's token resets per day (3 to 5 times with 2000 bucket logins), had zombies connections and offline status bot on others.
I didn't notice any re-login or zombie connections since 2am when I merged our fix in production.

@DraftProducts
Copy link
Contributor

I hear that you were waiting for more information, here are my session_start_limit remaining logins of the GET /bot/gateway request (without any token reset today).

{
  "url": "wss://gateway.discord.gg",
  "shards": 368,
  "session_start_limit": {
  	"total": 2000,
  	"remaining": 1995,
  	"reset_after": 79901986,
  	"max_concurrency": 16
  }
}

That shows that we just used 5 relogins today without any zombie connection, so for me the problem is solved, and this PR should be merged as soon as possible for others devs 🎉

@legendhimself
Copy link
Contributor Author

legendhimself commented Dec 31, 2022

Day 2,

Logins today

image

Stats Report

image

Uptime ^, No restarts so far. Usually, some of the shards by this time used to silently restart due to the unref on the closetimeout but now, as you saw in the previous login, everything is going fine with this fix.

This pr is safe to be merged unless someone else wants to test the fixes first. I think we can get this pr unblocked now and get it merged.

legendhimself and others added 2 commits December 31, 2022 09:31
- Fix backport discordjs#7626 missing changes
- Reverted the pull request discordjs#8956
- Removed unref of wsCloseTimeout
- We are resuming the connection for zombie instead of starting a new

Co-authored-by: DraftMan <nicovanaarsen@gmail.com>
Copy link
Member

@iCrawl iCrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

bot doesn't reconnect after reidentifying Bot randomly exiting process or going offline/unresponsive
7 participants