New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rate-limit makes the plugin unusable #398
Comments
@cbruno10 maybe you could add more information |
Hello, @electriquo. Sorry for the delayed response. Before we proceed with the fixes, I have a couple of questions, could you please take a look?
I attempted to replicate the error by querying the
Thank you! |
@ParthaI Install the github modules (steampipe-mod-github-compliance, steampipe-mod-github-insights, steampipe-mod-github-sherlock) and navigate to the GitHub Default Branch Protection Report. I have more than 2000 active (non-archived) repositories. |
Thanks, @electriquo, for the detailed information, I will give it a try to reproduce the issues, and make the necessary changes. |
Hello, @electriquo, following our previous discussion, I set up my local environment to try and replicate the error. Despite my attempts, I couldn't reproduce the rate limit error. The compliance/insight mods are functioning correctly across approximately For your information, the plugin initially used the REST API to generate results, which was prone to errors, particularly those related to rate limits. As a remedy, we've implemented GraphQL query support. Additionally, I experimented with the plugin code in my local environment, making around Query Result:
Thanks! |
@ParthaI You were experimenting, which is a reproduction best effort. Yet, it does not say that there is no issue :) As you can see in the description, the rate limit error is clear. |
Hello @electriquo, Thank you for your patience and for highlighting the issue once more. Indeed, the rate-limit error you mentioned is a significant concern, when navigating to the GitHub Default Branch Protection Report, given the extensive number of active (non-archived) repositories you manage, exceeding 2000. In an effort to meticulously replicate this scenario, I conducted tests involving over 2000 API calls within a single query. Regrettably, these tests did not trigger the same rate-limit error, which suggests the situation might be influenced by specific conditions or a huge number of repositories in the organization. Understanding the importance of accurately diagnosing and resolving this issue, I'd like to delve deeper. The dashboard you referred to leverages the Your cooperation and insights are greatly appreciated! |
No issue here but when I open |
Thank you, @electriquo, for conducting those tests and sharing your findings. Replicating the rate-limit error appears to be challenging with less set of data set in our environment. According to the documentation, using a Personal Access Token (PAT) allows for In comparison, our environment hosts approximately Here's the GraphQL query to check the rate limit: query {
viewer {
login
}
rateLimit {
limit
remaining
used
resetAt
}
} You can find more details on how rate limits are calculated for GraphQL queries in the GitHub documentation. Upon reviewing the GitHub insights mod, specifically the I'm further investigating the plugin and the insight mod to understand this behavior better and will update you with any progress. Thank you once more for your assistance! |
To clear any doubts, I am using a dedicate PAT; meaning that only Steampipe is using this PAT. Thus, the rate-limit must come when using Steampipe only. Could it be that not only the amount of repositories are the factor but the amount of metadata (such as a big Git history/commits)? |
Absolutely, @electriquo, you're correct. However, it's worth noting that within the Steampipe Dashboard, API calls are not initiated until you actively click on any of the hyperlinks. |
My repositories are reach, they are big with many metadata. |
Hello @electriquo, I hope this message finds you well. I am currently working on this issue aimed at reducing the number of API calls while quering the table
Please note, these modifications are specifically for the If you're willing to test the scenario, here are the steps:
Your feedback on these changes would be invaluable to me. Thank you very much for your cooperation and assistance in this matter. |
Hi @electriquo, have you had the chance to try it out yet? |
@ParthaI Sorry, will try it soon. |
$ steampipe query
Welcome to Steampipe v0.22.1
For more information, type .help
> .timing on
> select * from github_my_repository
Error: github: non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/free-pro-team@latest/rest/overview/rate-limits-for-the-rest-api#about-secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again. If you reach out to GitHub Support for help, please include the request ID 279A:5B4F:18264BC:2CAC1C2:65F97C91.\"\n}" (SQLSTATE HV000)
Time: 47.1s. Rows fetched: 250. Hydrate calls: 17,250.
Where is the install command for |
I believe as per our previous conversation it was working fine.
Please follow the following steps: Steampie Plugin changes:
Steampipe Github Insights Mod:
|
@ParthaI Haven't forgot, didn't find time this :( |
Hey @electriquo , I was able to reproduce the behaviours you were seeing (albeit with a much lower repository count) with I looked into the plugin code, and found our error handling code does retry secondary rate limit errors ( steampipe-plugin-github/github/errors.go Lines 39 to 43 in ec93282
steampipe-plugin-github/github/errors.go Lines 54 to 62 in ec93282
If you're directly running a query, one suggestion is to only select the columns you want to retrieve. If you're running a mod dashboard/benchmark/control, you could also try using rate limiters to slow down Steampipe. For instance, you can try adding this into your # stay well under the 100 hydrate/list/get functions concurrently based on limits in https://docs.github.com/en/graphql/overview/rate-limits-and-node-limits-for-the-graphql-api#secondary-rate-limits
plugin "github" {
limiter "github_global_concurrency" {
max_concurrency = 30
}
} And then see if that helps with running the dashboards or benchmarks you were trying before. I think For instance, with the limiter settings above, I avoided getting the secondary abuse rate limits, but eventually got this unhelpful 502 Bad Gateway error from them with no additional information:
I don't believe a lot of our dashboards run this type of query though. |
Does it mean that you are working on a fix and I should wait for a new release?
Where in the docs did you find about |
We are looking into if we can improve the error handling for rate limit errors, but we're not actively working on an identified fix and don't have a schedule on when it will be released. I'd suggest still trying to use a In the example I sent, the |
Thanks
In the future, how one may know the variables that can be set in a plugin to handle concurrency and rate limiting?
Maybe it should be documented in steampipe.io/docs/guides/limiter.
If you apply the block in |
@electriquo With that block in my Were you able to give the limiter a try, either with max concurrency of 30 (or a different number)? If so, did you see any more consistency in getting results back? Also, we should probably mention rate limiters and at least link to the doc on steampipe.io, which has some examples, so we'll add that to our backlog to see where that section belongs. |
It is on my list for tomorrow :)
Promise to keep you posted
Awesome, thanks you :) |
$ steampipe --version
Steampipe v0.22.2
$ steampipe plugin list --output json | jq -r '.installed[].name'
hub.steampipe.io/plugins/turbot/github@0.39.1
$ powerpipe mod list --output json | jq -r '.[].dependency_path'
github.com/turbot/steampipe-mod-github-insights@v0.4.0
github.com/turbot/steampipe-mod-github-compliance@v0.7.0
github.com/turbot/steampipe-mod-github-sherlock@v0.13.0
$ cat ~/.steampipe/config/github.spc
connection "github" {
plugin = "hub.steampipe.io/plugins/turbot/github@v0.39.1"
}
plugin "github" {
limiter "github_global_concurrency" {
max_concurrency = 30
}
} When I navigate to the GitHub Default Branch Protection Report dashboard, the rate-limit error still pops. But if I repeat #398 (comment) does not return a rate-limit error message $ steampipe query
Welcome to Steampipe v0.22.2
For more information, type .help
> .timing on
> select count(*) from github_my_repository
...
Time: 7.6s. Rows fetched: 1,088. Hydrate calls: 0. And then navigating to the GitHub Default Branch Protection Report dashboard, the rate-limit error does not pops but besides the repository list, all other columns are empty :( I look at look at the plugin log, and I found that it seems not to use (but do honor) the max concurrency connections. Here are some log lines
Also tried with |
@cbruno10 Hi, any insights? |
@cbruno10 Hello, do you have any new insights? |
Apologies for the radio silence on this issue @electriquo !! I think the reason why you didn't see the limiter tags getting honored could be indicative of a caching issue. Could you please retry the queries by launching a fresh instance of Steampipe and killing any old instances? Command that you could try before running
|
@misraved That's what I always did :)
Seems like firing in all directions rather than base things on data. |
Thanks for the clarification @electriquo !! We are actively exploring options to enhance the rate-limiting capabilities within our plugin. Currently, there is a limited selection of solutions that offer a straightforward method for managing errors originating from the API. |
@misraved Could you please follow #398 (comment)? |
Hi @electriquo , sorry for losing track of this thread.
|
The rate-limit error should be handled generally regardless the component that causes it, especially when there is a clear protocol for rate-limit. Specifically, GitHub has Rate limits for the REST API.
No. $ steampipe query "select distinct name from steampipe_plugin_limiter"
+---------------------------------------------------+
| name |
+---------------------------------------------------+
| aws_servicequotas_list_service_quotas |
| aws_servicequotas_list_aws_default_service_quotas |
| aws_servicequotas_list_tags_for_resource |
+---------------------------------------------------+ Why would you expect to see the limiter appear here when the plugin log clearly states so?
From your words, I understand that dashboards are useless :(
Received the same error across few GitHub's dashboards. |
Looking at your limiter configurations, can you please update it to:
Since you're using a specific plugin version, the Afterward, can you please restart Steampipe, If it does, can you please try running the single query you shared above first and see if that still executes correctly? If so, can you please then run the dashboard again? You may still get throttling errors, and if so, can you please try lowering Thanks! |
@cbruno10 Although I am cooperating, I am not the Steampipe QA team :)
This doesn't sound correct for me, just another short in the dark. After you took the toll to create and environment to:
I'd be happy to continue and assist. |
Describe the bug
github rate-limit make the plugin unusable
and prone to getting banned by github
Steampipe version (
steampipe -v
)Example: v0.21.2
Plugin version (
steampipe plugin list
)Example: v0.39.0
Expected behavior
github specifies exactly how to handle these status code, we should honor by implementing it.
Additional context
The text was updated successfully, but these errors were encountered: