Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Rest): better handling of global rate limit and invalid request tracking #4711

Merged
merged 8 commits into from Mar 31, 2021

Conversation

ahnolds
Copy link
Contributor

@ahnolds ahnolds commented Aug 15, 2020

Please describe the changes this PR makes and why it should be merged:

Currently, there is no support for preemptively avoiding the global rate limit - any requests in-flight to Discord at the time the limit is reached will simply result in 429s. This PR adds a configurable preemptive global rate limit option restGlobalRateLimit (defaulting to zero, which maintains the current behavior). Additionally, it makes the following related enhancements:

  • Report the real timeout when the global rate limit is hit
  • Handle rate sublimits for individual requests more cleanly
  • Remove unnecessary double wait from some rate limit handling code paths

This PR also adds tracking for the number of invalid REST requests (those that result in a 401, 403, or 429 status code) to aid in avoiding the 10k invalid requests in 10 minutes temporary ban. The library user can configure a warning period to through the new option invalidRequestWarningInterval (defaulting to zero, which emits no warnings), which causes an INVALID_REQUEST_WARNING event to be emitted with the number of invalid requests made in the 10 minute period and the amount of time left in the period. That is, if invalidRequestWarningInterval = 500, then an event will be emitted at the 500th, 1000th, 1500th, and so on invalid request in the 10 minute period.

Status

  • Code changes have been tested against the Discord API, or there are no code changes
  • I know how to update typings and have done so, or typings don't need updating

Semantic versioning classification:

  • This PR changes the library's interface (methods or parameters added)
    • This PR includes breaking changes (methods removed or renamed, parameters moved or removed)
  • This PR only includes non-code changes, like changes to documentation, README, etc.

src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
@ahnolds
Copy link
Contributor Author

ahnolds commented Aug 19, 2020

@NotSugden I made the changes you suggested (in a previous version of my changes I had been using the variables outside of the block scope, but since that's no longer the case, I agree that let makes more sense than var). Do you have any other suggestions/comments, or can you approve the review/merge this in?

@NotSugden
Copy link
Contributor

@NotSugden I made the changes you suggested (in a previous version of my changes I had been using the variables outside of the block scope, but since that's no longer the case, I agree that let makes more sense than var). Do you have any other suggestions/comments, or can you approve the review/merge this in?

i don't have anymore suggestions for a review, and its not up to me what gets merged.

src/rest/RequestHandler.js Outdated Show resolved Hide resolved
Copy link
Contributor

@papaia papaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid colliding with the global global

src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
Copy link
Contributor

@Koyamie Koyamie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not fix the rate limit issue.

I tried it with a ~650k servers bot using 320 shards.

10k limit was reached even with this handling.

Requests were not stopped when reaching the 10k bad requests limits and were spamming the API even being banned.

@ahnolds
Copy link
Contributor Author

ahnolds commented Sep 9, 2020

@Koyamie

This PR does not fix the rate limit issue.

There are certainly plenty of cases that this PR doesn't catch, but that's true of all discord.js rate limit handling now. For example if a bot restarts, it loses all rate limit state information, but Discord still knows how close to a rate limit you are. There is also no tracking for rate sub-limits like the 2/10 minute limit on channel renames/topic changes. This is a best effort fix that should work for most (non-sharded, see below) bots.

I tried it with a ~650k servers bot using 320 shards.

I could be wrong, but my understanding of sharding was that each shard is normally handled by a separate process (a child process of the original bot process if you use ShardingManager). In this case, each shard will have its own separate copy of discord.js in memory, and hence its own separate info on the global rate limit state. AFAIK (I don't have a sharded bot to test with) the global rate limit applies across all shards (e.g. 20 requests/second from each of 3 shards = 60 requests would exceed the default 50/s global rate limit). That means you'd have to pass back the request counts to the parent process and it would be responsible for tracking the rate limit. Not only is that much more complex, it would also incur a potentially big performance hit since you'd have to make an inter-process call from the child process to essentially ask for permission to make the request from the parent process at every API call. I'm not opposed to adding this functionality in, but a) I'm not going to be the person to write it and b) I don't think it's worth holding up this PR while someone else does write it.

I'd suggest a workaround instead: simply set the restGlobalRateLimit setting to something lower than the real global rate limit. You could set it to global_rate_limit/num_shards to guarantee avoiding the global rate limit, although with 320 shards, that may not be practical (IDK what the increased global rate limit for bots in >200k guilds is). You can tune this number to some amount that lets each shard make a reasonable number of requests while still avoiding the global limit unless many shards each reach their limit at the same time.

10k limit was reached even with this handling.
Requests were not stopped when reaching the 10k bad requests limits and were spamming the API even being banned.

This PR doesn't attempt to do anything to prevent you from hitting the 10k bad request limit, it just provides a mechanism for you to be aware that you are approaching the limit via the INVALID_REQUEST_WARNING event (if you configure the invalidRequestWarningInterval setting). That's because different bots will want to handle this case differently. Some might want to shut down operations entirely until the 10 minute period is up to avoid any chance of hitting the global rate limit, others might want to simply restrict operations to a set of functionality less likely to generate invalid requests, and still others might just YOLO and hope they don't actually hit the limit. Since its so bot-dependent, I deemed it best to simply add functionality to let bot developers handle it as they think appropriate.

@hedgehog1029
Copy link

by the way, retryAfter is a value in seconds, not milliseconds. the current version of discord.js has a (major!) bug where it passes retryAfter directly into Util.delayFor which takes a number of milliseconds, and it looks like you've accidentally carried forward that wrong assumption here. should only be a couple easy changes to multiply by 1000!

@tipakA
Copy link
Contributor

tipakA commented Nov 9, 2020

That is completely wrong for the current version of the lib.
The retryAfter is in fact given in seconds, but only on v8 of the API, which discord.js does not use (yet). Discord.js v12 uses API v7, which is almost exactly same as API v6, which uses miliseconds for retryAfter.

@hedgehog1029
Copy link

@tipakA while it seems like you're correct according to the OLD docs, what tipped me off to this was running a test case (which you obviously need a globally-ratelimited bot to reproduce, so you'll have to trust me):

const { Client } = require("discord.js");

const c = new Client();
c.token = yourTokenVariable;

c.on("debug", (msg) => {
        console.log(msg);

        const handler = c.rest.handlers.get("/gateway/bot");
        console.log(`429 retryAfter: ${handler.retryAfter}`);
});

c.on("rateLimit", (rl) => console.log(`Ratelimit on '${rl.route}' => ${rl.timeout}`));

c.api.gateway.bot.get().then((res) => {
        console.dir(res);
}).catch((err) => {
        console.error(err);
});

running this I was greeted with the lovely sight:

services@vmi465179:~/Sirona$ node ReportStatus.js 
429 hit on route /gateway/bot
429 retryAfter: 353
429 hit on route /gateway/bot
429 retryAfter: 353
429 hit on route /gateway/bot
429 retryAfter: 352
429 hit on route /gateway/bot
429 retryAfter: 352
^C

so honestly seems like Discord messed up and changed it.

@ahnolds
Copy link
Contributor Author

ahnolds commented Nov 14, 2020

I'll be honest, I gave up on anyone merging this like a month ago. If someone who actually is a maintainer for discord.js wants to weigh in that they would want this in a future version, please let me know - I'm happy to update the PR for Discord API v8/discord.js v13(?).

I continue to think this would be a super helpful feature, but I have no intention of updating this code myself again and again just to keep having a well thought out PR ignored by those with the authority to review and merge it. No offense, obviously you all have lives and other things to do besides merge PRs all day 😃

Copy link
Member

@SpaceEEC SpaceEEC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These small nits aside, this is looking pretty good

src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/util/Constants.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
@iCrawl
Copy link
Member

iCrawl commented Jan 27, 2021

This needs a rebase.

@ahnolds
Copy link
Contributor Author

ahnolds commented Jan 29, 2021

This needs a rebase.

Since this is actually moving, I'm happy to make some changes to bring it up to the current state. Do you actually want it rebased onto master @iCrawl, or should I just merge master into my branch? Both probably equivalently easy for me, I'm just in favor of whatever doesn't screw up this Pull Request :)

@iCrawl
Copy link
Member

iCrawl commented Jan 29, 2021

Either way is fine as long as the conflicts get resolved.

@ahnolds
Copy link
Contributor Author

ahnolds commented Jan 30, 2021

Rebase complete

@advaith1
Copy link
Contributor

@ahnolds is this pr compatible with api v8?

The Retry-After header is now based in seconds instead of milliseconds (e.g. 123 means 123 seconds)
The X-RateLimit-Precision header is no longer respected. X-RateLimit-Reset and X-RateLimit-Reset-After are always returned at millisecond precision (e.g. 123.456 instead of 124)

(also there's a conflict)

@ahnolds
Copy link
Contributor Author

ahnolds commented Feb 18, 2021

is this pr compatible with api v8?

Probably not right now - when I submitted it, discord.js was still working with v6/v7. The changes should be relatively minor (potentially just one division by 1000). I'll try to verify and update this week/weekend.

ahnolds and others added 5 commits February 18, 2021 09:23
Add customizable restGlobalRateLimit to pre-emptively monitor global rate limit
Report the real tiemout when the global rate limit is hit
Handle rate sublimits for individual requests
Remove unnecessary double wait from some rate limit handling code paths
Emit a warning periodically as the the number of invalid (401, 403,
429) rest requests increases to allow clients to avoid the 10k invalid
requests in 10 minutes temporary ban
Declare variables with let rather than var (in a previous version these variables had been used outside of block scope).

Co-authored-by: Sugden <28943913+NotSugden@users.noreply.github.com>
Co-authored-by: Tristan Guichaoua <33934311+tguichaoua@users.noreply.github.com>
Avoid colliding with the global `global`

Co-authored-by: Papaia <43409674+Papaia@users.noreply.github.com>
@ahnolds
Copy link
Contributor Author

ahnolds commented Feb 18, 2021

Should be good now. On an unrelated note, I'll have another (very small) PR coming in shortly - looks like 5ac3b57 broke channel renames since a rename doesn't set the type property in the inbound data and the GuildChannel.type property is a string rather than the int that Discord expects.

@tipakA
Copy link
Contributor

tipakA commented Feb 18, 2021

#5251 is already a thing.

src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
Co-authored-by: Sugden <28943913+NotSugden@users.noreply.github.com>
Co-authored-by: Vlad Frangu <kingdgrizzle@gmail.com>
@kyranet kyranet requested a review from vladfrangu March 28, 2021 16:51
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
src/rest/RequestHandler.js Show resolved Hide resolved
src/rest/RequestHandler.js Outdated Show resolved Hide resolved
Copy link
Member

@vladfrangu vladfrangu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you tested all these changes and no unexpected behavior happened, lgtm i suppose

Copy link
Member

@kyranet kyranet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally not happy with this PR's idea because the lack of unit testing (which -next REST's has, but isn't ready yet), but if it's tested and works perfectly, LGTM.

partials: [],
restWsBridgeTimeout: 5000,
restRequestTimeout: 15000,
restGlobalRateLimit: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this not default to 50?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons:

  • I thought it best to maintain the current behavior when I originally wrote this
  • That would be a bad value to use for a bot with the mystery higher global rate limit in >150k servers

I certainly don't object if the d.js team thinks it would be better to set a default of 50.

@iCrawl iCrawl merged commit 9d2d606 into discordjs:master Mar 31, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Aug 11, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Aug 11, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Oct 1, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Oct 9, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Oct 9, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Oct 19, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Oct 25, 2021
ckohen added a commit to ckohen/discord.js-modules that referenced this pull request Oct 26, 2021
Ported from discordjs/discord.js#4711

refactor(SequentialHandler): use timers/promises

refactor(SequentialHandler): move global checks to each request

Co-authored-by: Vlad Frangu <kingdgrizzle@gmail.com>

fix: separate queue for sub limits

refactor(SequentialHandler): address PR comments

Co-authored-by: Vlad Frangu <kingdgrizzle@gmail.com>

refactor(SequentialHandler): better sublimit handling

refactor: address pr comments

Co-Authored-By: Vlad Frangu <kingdgrizzle@gmail.com>
Co-Authored-By: Antonio Román <kyradiscord@gmail.com>
Co-Authored-By: Noel <icrawltogo@gmail.com>

test: add sublimit tests

refactor(Util): only check route once

refactor: rename globalLimit

Co-authored-by: Vlad Frangu <kingdgrizzle@gmail.com>

fix: address pr comments

Co-authored-by: Vlad Frangu <kingdgrizzle@gmail.com>

refactor(Utils): use enum

Co-authored-by: Vlad Frangu <kingdgrizzle@gmail.com>

Update packages/rest/src/lib/handlers/SequentialHandler.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet