Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry upon transient error during paginated stargazer/fork retrieval #82

Open
jgehrcke opened this issue Oct 6, 2023 · 0 comments
Open

Comments

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 6, 2023

A known limitation since starting to use pygithub for fetching data: when using their API to iterate through pages via e.g. for count, fork in enumerate(repo.get_forks(), 1) then the individual HTTP request is not retried upon transient error and it's also not really easy to do so (to retry the HTTP request corresponding to one specific page out of many pages) cleanly from the calling program.

Example of a boring transient error affecting one of many HTTP requests, taking down the entire action run.

...
231004-23:07:04.767 INFO:MainThread: 8000 forks fetched
231004-23:07:11.723 INFO:MainThread: 8200 forks fetched
231004-23:07:18.807 INFO:MainThread: 8400 forks fetched
...
Traceback (most recent call last):
  File "//fetch.py", line 596, in <module>
    main()
  File "//fetch.py", line 111, in main
    fetch_and_write_fork_ts(repo, args.fork_ts_outpath)
  File "//fetch.py", line 225, in fetch_and_write_fork_ts
    dfforkcsv = get_forks_over_time(repo)
  File "//fetch.py", line 434, in get_forks_over_time
    for count, fork in enumerate(repo.get_forks(), 1):
  File "/usr/local/lib/python3.10/site-packages/github/PaginatedList.py", line 56, in __iter__
    newElements = self._grow()
  File "/usr/local/lib/python3.10/site-packages/github/PaginatedList.py", line 67, in _grow
    newElements = self._fetchNextPage()
  File "/usr/local/lib/python3.10/site-packages/github/PaginatedList.py", line 199, in _fetchNextPage
    headers, data = self.__requester.requestJsonAndCheck(
  File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 354, in requestJsonAndCheck
    *self.requestJson(
  File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 454, in requestJson
    return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)
  File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 528, in __requestEncode
    status, responseHeaders, output = self.__requestRaw(
  File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 555, in __requestRaw
    response = cnx.getresponse()
  File "/usr/local/lib/python3.10/site-packages/github/Requester.py", line 127, in getresponse
    r = verb(
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
...

Retrying this naively at the higher level would involve fetching all forks again. Of course, this is Python and we can do all kinds of workarounds. But they would take more time to build and test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant