Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(github): long-term datasource caching #15653

Merged

Conversation

zharinov
Copy link
Collaborator

@zharinov zharinov commented May 19, 2022

Changes

  • Enable two-phase caching for github-releases and github-datasources:
    • Pre-fetch at most 10000 items, sorted by releaseTimestamp and clear all the items from the cache 7 days later
    • Query newly created items (if any) and append them to already pre-fetched items every 30 minutes
      • Stop this updating procedure once processing has reached the cached items created more than 30 days ago, as we consider it stabilized
    • If none of 30 minutes and 7 days long timeouts is exceeded, return cached values
  • Additionally, leverage the data from this cache to be used for changelog generation

Context

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please tick one)

I have verified these changes via:

  • Code inspection only, or
  • Newly added/modified unit tests, or
  • No unit tests but ran on a real repository, or
  • Both unit tests + ran on a real repository

@zharinov zharinov requested review from viceice and rarkins May 19, 2022 15:37
@zharinov
Copy link
Collaborator Author

Still need tests, but comments are welcome

Copy link
Member

@viceice viceice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks promising, make sure it works for GitHub enterprise

lib/modules/datasource/github-releases/cache-base.ts Outdated Show resolved Hide resolved
@zharinov
Copy link
Collaborator Author

zharinov commented May 19, 2022 via email

Copy link
Collaborator

@rarkins rarkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please heavily document the logic flow (and make sure variables are named for optimal understandability)

@zharinov zharinov marked this pull request as ready for review May 22, 2022 11:51
@zharinov zharinov requested review from viceice and rarkins May 23, 2022 09:54
@rarkins
Copy link
Collaborator

rarkins commented May 24, 2022

Does this solution mean we now fetch more than 1000 tags/releases limit we currently have? If not, will it support it, or does GraphQL have a 1000 limitation?

When I run this PR locally it's using less queries than I'd expect, and I want to check that it's a good thing, not a bad sign.

Test repo with node and renovate (both of which have a lot of tags): https://github.com/renovate-tests/node5/pulls

Cold cache result:

DEBUG: http statistics (repository=renovate-tests/node5)
       "urls": {
         "https://api.github.com/graphql (POST,200)": 7,
         "https://api.github.com/repos/nodejs/node (GET,200)": 1,
         "https://api.github.com/repos/nodejs/node/git/blobs/3dd839f196f3bc4aaeda5ac04c3fc32b3293697b (GET,200)": 1,
         "https://api.github.com/repos/nodejs/node/git/trees/master (GET,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/git/commits (POST,201)": 2,
         "https://api.github.com/repos/renovate-tests/node5/git/refs (POST,201)": 2,
         "https://api.github.com/repos/renovate-tests/node5/git/trees (POST,201)": 2,
         "https://api.github.com/repos/renovate-tests/node5/issues (POST,201)": 1,
         "https://api.github.com/repos/renovate-tests/node5/pulls (GET,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/pulls (POST,201)": 2,
         "https://api.github.com/repos/renovatebot/renovate (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/renovate/git/trees/main (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/base.json (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/default.json (GET,200)": 1,
         "https://api.github.com/repos/whitesource/merge-confidence/contents/beta.json (GET,200)": 1,
         "https://auth.docker.io/token (GET,200)": 1,
         "https://index.docker.io/v2/ (GET,401)": 1,
         "https://index.docker.io/v2/renovate/renovate/blobs/sha256:cef96165cda1afc629d20c6a5810f1a88095b907d3462ba876668cb9c51773cd (GET,200)": 1,
         "https://index.docker.io/v2/renovate/renovate/manifests/latest (GET,200)": 1,
         "https://index.docker.io/v2/renovate/renovate/tags/list (GET,200)": 2,
         "https://index.docker.io/v2/renovate/renovate/tags/list (GET,401)": 1
       },
       "hostStats": {
         "api.github.com": {"requestCount": 25, "requestAvgMs": 676, "queueAvgMs": 0},
         "auth.docker.io": {"requestCount": 1, "requestAvgMs": 438, "queueAvgMs": 0},
         "index.docker.io": {"requestCount": 6, "requestAvgMs": 862, "queueAvgMs": 0}
       },
       "totalRequests": 32

What most surprises is me is only 7 GraphQL queries total. We normally use 2 GraphQL per repo anyway, so that means an extra 5 queries for the node tags lookup plus release notes?

BTW I just thought of a potential problem this introduces. If someone tries to configure a custom PAT host rule for some third party repo (i.e. to override the token we have for api.github.com) then it won't work - will need to use the same token as for the main repo. However this use case hasn't been working fully for a while due to the way github has been doing pagination (page 2+ use an unpredictable URL).

@zharinov
Copy link
Collaborator Author

zharinov commented May 24, 2022

When I run this PR locally it's using less queries than I'd expect, and I want to check that it's a good thing, not a bad sign.

I've simplified my resulting code too much, pushing fix right now...

@zharinov zharinov requested a review from viceice May 25, 2022 13:22
@rarkins
Copy link
Collaborator

rarkins commented May 26, 2022

Tested again with node and renovate as dependencies. The number of queries is higher as I had been expecting:

DEBUG: http statistics (repository=renovate-tests/node5)
       "urls": {
         "https://api.github.com/graphql (POST,200)": 132,
         "https://api.github.com/repos/nodejs/node (GET,200)": 1,
         "https://api.github.com/repos/nodejs/node/git/blobs/3dd839f196f3bc4aaeda5ac04c3fc32b3293697b (GET,200)": 1,
         "https://api.github.com/repos/nodejs/node/git/trees/master (GET,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/branches/main/protection (GET,404)": 1,
         "https://api.github.com/repos/renovate-tests/node5/git/commits (POST,201)": 2,
         "https://api.github.com/repos/renovate-tests/node5/git/refs (POST,201)": 1,
         "https://api.github.com/repos/renovate-tests/node5/git/refs/heads/renovate/renovate-renovate-32.x (PATCH,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/git/trees (POST,201)": 2,
         "https://api.github.com/repos/renovate-tests/node5/issues/4 (GET,200)": 2,
         "https://api.github.com/repos/renovate-tests/node5/issues/4 (PATCH,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/pulls (GET,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/pulls (POST,201)": 1,
         "https://api.github.com/repos/renovate-tests/node5/pulls/3 (PATCH,200)": 1,
         "https://api.github.com/repos/renovatebot/renovate (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/renovate/git/trees/main (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/base.json (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/default.json (GET,200)": 1,
         "https://api.github.com/repos/whitesource/merge-confidence/contents/beta.json (GET,200)": 1,
         "https://auth.docker.io/token (GET,200)": 1,
         "https://index.docker.io/v2/ (GET,401)": 1,
         "https://index.docker.io/v2/renovate/renovate/blobs/sha256:104180c2c51bc8f4612f9db5bbfb425d27287d4baee7e1ab06c295f026765e0e (GET,200)": 1,
         "https://index.docker.io/v2/renovate/renovate/manifests/latest (GET,200)": 1,
         "https://index.docker.io/v2/renovate/renovate/tags/list (GET,200)": 2,
         "https://index.docker.io/v2/renovate/renovate/tags/list (GET,401)": 1
       },
       "hostStats": {
         "api.github.com": {"requestCount": 153, "requestAvgMs": 621, "queueAvgMs": 0},
         "auth.docker.io": {"requestCount": 1, "requestAvgMs": 395, "queueAvgMs": 0},
         "index.docker.io": {"requestCount": 6, "requestAvgMs": 838, "queueAvgMs": 0}
       },
       "totalRequests": 160

Then immediately after:

DEBUG: http statistics (repository=renovate-tests/node5)
       "urls": {
         "https://api.github.com/graphql (POST,200)": 2,
         "https://api.github.com/repos/renovate-tests/node5/branches/main/protection (GET,404)": 1,
         "https://api.github.com/repos/renovate-tests/node5/contents/renovate.json (GET,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/issues/4 (GET,200)": 2,
         "https://api.github.com/repos/renovate-tests/node5/pulls (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/base.json (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/default.json (GET,200)": 1,
         "https://api.github.com/repos/whitesource/merge-confidence/contents/beta.json (GET,200)": 1
       },
       "hostStats": {"api.github.com": {"requestCount": 10, "requestAvgMs": 273, "queueAvgMs": 0}},
       "totalRequests": 10

@rarkins
Copy link
Collaborator

rarkins commented May 26, 2022

After waiting >30 minutes:

DEBUG: http statistics (repository=renovate-tests/node5)
       "urls": {
         "https://api.github.com/graphql (POST,200)": 4,
         "https://api.github.com/repos/renovate-tests/node5/branches/main/protection (GET,404)": 1,
         "https://api.github.com/repos/renovate-tests/node5/contents/renovate.json (GET,200)": 1,
         "https://api.github.com/repos/renovate-tests/node5/issues/4 (GET,200)": 2,
         "https://api.github.com/repos/renovate-tests/node5/pulls (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/base.json (GET,200)": 1,
         "https://api.github.com/repos/renovatebot/spring-remediations/contents/default.json (GET,200)": 1,
         "https://api.github.com/repos/whitesource/merge-confidence/contents/beta.json (GET,200)": 1,
         "https://auth.docker.io/token (GET,200)": 1,
         "https://index.docker.io/v2/renovate/renovate/tags/list (GET,200)": 2,
         "https://index.docker.io/v2/renovate/renovate/tags/list (GET,401)": 1
       },
       "hostStats": {
         "api.github.com": {"requestCount": 12, "requestAvgMs": 347, "queueAvgMs": 0},
         "auth.docker.io": {"requestCount": 1, "requestAvgMs": 442, "queueAvgMs": 0},
         "index.docker.io": {"requestCount": 3, "requestAvgMs": 1390, "queueAvgMs": 0}
       },
       "totalRequests": 16

The 2 extra github graphql requests seen here correspond with this PR's described behavior.

@zharinov
Copy link
Collaborator Author

What about :shipit:?

@rarkins rarkins changed the title feat(github): Implement long-term datasource caching feat(github): long-term datasource caching Jun 3, 2022
@rarkins rarkins merged commit 2e957ba into renovatebot:main Jun 3, 2022
@rarkins rarkins deleted the feat/github-datasource-long-term-cache branch June 3, 2022 09:27
@renovate-release
Copy link
Collaborator

🎉 This PR is included in version 32.73.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

rarkins added a commit that referenced this pull request Jun 3, 2022
@atorrescogollo atorrescogollo mentioned this pull request Jun 3, 2022
6 tasks
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use intelligent caching + pagination for github releases and tags
4 participants