New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add table github_repository_content #317
Conversation
SubTree struct { | ||
Entries []struct { | ||
Name githubv4.String | ||
Path githubv4.String | ||
Size githubv4.Int | ||
LineCount githubv4.Int | ||
Mode githubv4.Int | ||
PathRaw githubv4.String | ||
IsGenerated githubv4.Boolean | ||
Type githubv4.String | ||
Object struct { | ||
Blob struct { | ||
Oid githubv4.String | ||
AbbreviatedOid githubv4.String | ||
Text githubv4.String | ||
IsBinary githubv4.Boolean | ||
CommitUrl githubv4.String | ||
} `graphql:"... on Blob"` | ||
} | ||
} | ||
} `graphql:"... on Tree"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This limits us to one level of directories from the path entered (or repo root), do we want only one level of directories or do we need to figure out how to parse deeper? @cbruno10, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbruno10 / @graza-io, I've delved into a deeper examination of parsing GitHub file contents deeper level. Kindly take a moment to review my findings.
- I attempted to fetch all file contents from a repository by recursively executing a GraphQL query. However, I consistently encountered a
Error: non-200 OK status code: 502 Bad Gateway body
error. - The analysis was carried out on the
turbot/steampipe-plugin-aws
repository, which contains a significant number of files. - Despite configuring a rate limiter at the plugin level, I achieved no success.
- I refined the GraphQL query to retrieve file content down to the 3rd, 4th, and 5th levels. Yet, in all scenarios, I faced the same error.
Error: non-200 OK status code: 502 Bad Gateway body: "{\n \"data\": null,\n \"errors\":[\n {\n \"message\":\"Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include `04E3:1AE1DB:1814A98:18ECE96:660C236B` when reporting this issue.\"\n }\n ]\n}\n" (SQLSTATE HV000)
- It may fail due to insufficient storage if the repository has a larger file content.
Based on my observations, attempting to fetch the contents of all files in a repository up to the nᵗʰ
level tends to be error-prone. On the other hand, we offer flexibility by allowing users to specify the file path for which they wish to obtain content details. By including the repository_content_path
value in the where clause, we can target the retrieval of file contents from a specific directory within a repository.
I greatly value your feedback and suggestions.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @aminvielledebatAtBedrock,
I apologize for not updating you earlier regarding your PR here.
- The original PR utilized the GitHub REST API to populate column values, and we experienced recursive API calls.
- Based on our previous experiences, the REST API tends to be more error-prone, primarily due to rate limit errors.
- In the current PR, we have shifted from using the REST API to a GraphQL query to enhance efficiency and reliability.
- However, please be aware that GraphQL does not support fetching file content to an arbitrary depth within a repository. Extending the nodes in a GraphQL query to retrieve file content beyond one level may result in throttling errors in the case if the repository has a huge set of file content. Query Reference.
- We've addressed this by constructing a GraphQL query that retrieves file content up to one level deep and uses a recursive approach for deeper levels as needed.
- Additionally, we've handled potential throttling errors that may occur when a repository contains a substantial amount of content by appropriately structuring our GraphQL queries.
We have updated this current PR to ensure all file content under a repository is accessible as intended.
Thanks!
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
Hi @ParthaI ! Do you have any update on this PR ? |
…ampipe-plugin-github into add-github-repo-content-table
@aminvielledebatAtBedrock, I appreciate your interest in the PR. Currently, This PR is under review. I'll provide you with an update once I have more information. Your patience is greatly appreciated! |
Hi @ParthaL, we are also intereset in this PR |
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
Hi @graza-io , could you remove the stale label please ? We still need this new table :-) |
@aminvielledebatAtBedrock, Just an update, the PR is able to get the repo content up to one level of directories from the path entered (or repo root). We are figuring out a way to get all the details up to the Thank you for your patience! |
… add-github-repo-content-table
… add-github-repo-content-table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthaI please take a look at the minor review comments. Thanks!!
Example query results
Query:
steampipe query "select repository_full_name, type, name, path from github_repository_content where repository_full_name = 'turbot/steampipe-plugin-aws'"
Result:
output.json