Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to sleep until rate limit resets #19

Open
brettcannon opened this issue Mar 27, 2017 · 10 comments · May be fixed by #111
Open

Provide a way to sleep until rate limit resets #19

brettcannon opened this issue Mar 27, 2017 · 10 comments · May be fixed by #111

Comments

@brettcannon
Copy link
Collaborator

Because this is tough to get right and doesn't work in all cases, I am not going to add this until someone shows a need for this and an approach is found to be reasonable in the common case to warrant adding the code.

@brettcannon
Copy link
Collaborator Author

One approach as a context manager (to help delineate the code waiting for quota):

class _RateLimitManager:

    """A context manager which waits until there is some request quota.

    This context manager is **not** thread-safe.
    """

    def __init__(self, gh: "GitHubAPI", need: int) -> None:
        self._gh = gh
        self._need = need

    def update_rate_limit(self) -> None:
        """Update the rate limit in a blocking fashion."""
        pass

    async def __aenter__(self) -> None:
        """Wait until there is at least enough requested quota."""
        if self._gh.rate_limit is None:
            # First time using the GitHubAPI instance, so get accurate rate
            # limit details.
            self.update_rate_limit()
        # A worst-case scenario is when the rate limit is reached, the limit is
        # N requests, there are N+1 sleeping coroutines, and all N+1 coroutines
        # wake up in FIFO order and being waiting for responses before the first
        # request completes with the new rate limit details. To prevent this
        # worst-case from occurring, the last step is short-circuited by doing
        # a **blocking** check for the current rate limit. That makes sure
        # there's no potential race conditions in getting the new rate limit
        # when the reset epoch has passed.
        while self._gh.rate_limit.remaining < self._need:
            now = datetime.datetime.now(datetime.timezone.utc)
            if self._gh.rate_limit.reset_datetime > now:
                wait = self.rate_limit.reset_datetime - now
                await self._sleep(wait.total_seconds())
            else:
                self.update_rate_limit()

    async def __aexit__(self, exc_type, exc, tb):
        pass

@brettcannon
Copy link
Collaborator Author

brettcannon commented Mar 27, 2017

Could add a method like await gh.need_quota(method, *args, **kwargs) which simply postpones calling the method provided until quota for a single call is available.

@brettcannon brettcannon changed the title Provide a way to sleep until rate limit resets. Provide a way to sleep until rate limit resets Mar 28, 2017
@Mariatta
Copy link
Member

With the automerge feature, miss-islington now encounters ratelimit issues 😞

So this feature will be appreciated.

@tacaswell
Copy link

A slightly less elegant approach would be add one more piece of state to the GithubAPI class of a user-injectable rate limit manager co-routine + keeping track of the number of requests that are "out".

I propose the following:

  • adding a requests_in_fight counter
  • add a user injected co-routine to the GithubAPI instance to decide how to throttle requests
  • add another method to GithubAPI called _managed_request that updates the "in-flight" counter and calls the above co-routine.

The requests_in_flight is a bit redundent the rate limit, but I think is needed. In _make_request the rate limit is pre-emptively decremented, but I think that leaves open the issue of make N of N requests before any come back, the first one comes back with N-1 requests remaining, try to make N requests -> a whole bunch of those are over the rate limit and get bounced.

Making the logic inject-able via a co-routine makes it easy for users to experiment with different strategies for deciding when / how long to sleep based on the rate limit. For example I can imagine that in some cases you would prefer to throttle yourself but maintain a constant rate of requests rather than exhausting the limit in bursts and have long silent periods.

@tacaswell
Copy link

About 10 minutes after posting the previous message I re-read the code and realized that _make_request is the right place to put the additional accounting so the last point should be "put the extra logic in _make_request 🤦‍♂️

@tacaswell tacaswell linked a pull request Apr 6, 2020 that will close this issue
@brettcannon
Copy link
Collaborator Author

I do like your requests_in_fight typo. 😉

@brettcannon
Copy link
Collaborator Author

Punting on the solution out to users to decide how they want to manage it is definitely interesting. Maybe if a check_quota() method is defined then it is used at the very start of _make_request()? Or provide a default method that immediately returns and allow for overloading in subclasses (once again called at the start of _make_request()? Since the current rate limit is stored on the GitHubAPI instance already it shouldn't be necessary to do much else.

Or am I missing something from this idea? Seems too straight-forward ...

@oprypin
Copy link

oprypin commented Mar 7, 2021

I am definitely looking for this feature right now, and not finding it anywhere.

A general comment.

https://github.com/brettcannon/gidgethub/blob/3b8941a0ccdad007db09dd601963ce989d64074f/gidgethub/abc.py#L114

As I understand the code right now, it seems that the class stores the rate limit of whatever method you happened to be calling last (there are separate rate limits). Running out of one quota, then calling another kind of method to "reset" the quota, will avoid detection.


It seems to me that a robust solution would have to react to the event that GitHub responds with "out of quota", rather than doing all the magic to avoid doing it. And, again, looking at the code, there's absolutely no way for a user of this library to achieve this.

In particular, consider getiter.
https://github.com/brettcannon/gidgethub/blob/3b8941a0ccdad007db09dd601963ce989d64074f/gidgethub/abc.py#L138

Let's say I'm doing a search which has 100 pages. The search quota is 30 per minute. How am I supposed to ever consume the whole output, if my only recourse is to catch the exception and try again from the beginning?

And the context manager shown above is also completely deaf to getiter usages, it only ensures that we have 1 quota remaining as I see, but during iteration nothing is checked.

@Mariatta
Copy link
Member

I've been encountering rate-limit issues myself, and also related to the search API using the getiter coroutine, as described by @oprypin above.

At the moment I tried overcoming this by not using getiter, and just use getitem and do my own iteration. But it is a bit tedious.

Currently the getitem just ignored the more value that was returned by _make_request. Perhaps if we can start returning it to the caller, that might help?

https://github.com/brettcannon/gidgethub/blob/4774a896f87023bcd3954cdf072656e69ca3e9ea/gidgethub/abc.py#L133-L136

https://github.com/brettcannon/gidgethub/blob/4774a896f87023bcd3954cdf072656e69ca3e9ea/gidgethub/abc.py#L120

@brettcannon
Copy link
Collaborator Author

How would changing the return value be backwards-compatible? I suspect most users don't hit limits so I would prefer to not break the API for this if there's another way to provide details.

Another option is to do something as @oprypin suggested in #153 and have the iterator have rate limit details that are updated after each iteration. It would require iterating manually or using a custom iterator underneath that implemented whatever backpressure strategy you wanted to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants