Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table github_repository_content #317

Merged
merged 16 commits into from May 17, 2024
Merged

Conversation

ParthaI
Copy link
Contributor

@ParthaI ParthaI commented Aug 18, 2023

Example query results

Query: steampipe query "select repository_full_name, type, name, path from github_repository_content where repository_full_name = 'turbot/steampipe-plugin-aws'"

Result:

output.json

@ParthaI ParthaI self-assigned this Aug 18, 2023
Comment on lines 98 to 118
SubTree struct {
Entries []struct {
Name githubv4.String
Path githubv4.String
Size githubv4.Int
LineCount githubv4.Int
Mode githubv4.Int
PathRaw githubv4.String
IsGenerated githubv4.Boolean
Type githubv4.String
Object struct {
Blob struct {
Oid githubv4.String
AbbreviatedOid githubv4.String
Text githubv4.String
IsBinary githubv4.Boolean
CommitUrl githubv4.String
} `graphql:"... on Blob"`
}
}
} `graphql:"... on Tree"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limits us to one level of directories from the path entered (or repo root), do we want only one level of directories or do we need to figure out how to parse deeper? @cbruno10, thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbruno10 / @graza-io, I've delved into a deeper examination of parsing GitHub file contents deeper level. Kindly take a moment to review my findings.

  • I attempted to fetch all file contents from a repository by recursively executing a GraphQL query. However, I consistently encountered a Error: non-200 OK status code: 502 Bad Gateway body error.
  • The analysis was carried out on the turbot/steampipe-plugin-aws repository, which contains a significant number of files.
  • Despite configuring a rate limiter at the plugin level, I achieved no success.
  • I refined the GraphQL query to retrieve file content down to the 3rd, 4th, and 5th levels. Yet, in all scenarios, I faced the same error.
Error: non-200 OK status code: 502 Bad Gateway body: "{\n   \"data\": null,\n   \"errors\":[\n      {\n         \"message\":\"Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include `04E3:1AE1DB:1814A98:18ECE96:660C236B` when reporting this issue.\"\n      }\n   ]\n}\n" (SQLSTATE HV000)
  • It may fail due to insufficient storage if the repository has a larger file content.

Based on my observations, attempting to fetch the contents of all files in a repository up to the nᵗʰ level tends to be error-prone. On the other hand, we offer flexibility by allowing users to specify the file path for which they wish to obtain content details. By including the repository_content_path value in the where clause, we can target the retrieval of file contents from a specific directory within a repository.

I greatly value your feedback and suggestions.

Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthaI Is this behaviour the same compared to when we used the v3 APIs? Are we able to get nested files with that API/the previous table implemented proposed in #207?

Copy link
Contributor Author

@ParthaI ParthaI Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbruno10, the modifications in previous PR #207 were limited to retrieving file contents at the root level only. In the latest PR, we've advanced our approach to parse file contents down to one level down the root.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ParthaI ! I don't understand why you are talking about root level only in the #207.
I was able to retrieve sub directory's file content with this PR.

What do I not understand ?

Copy link
Contributor Author

@ParthaI ParthaI May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @aminvielledebatAtBedrock,

I apologize for not updating you earlier regarding your PR here.

  • The original PR utilized the GitHub REST API to populate column values, and we experienced recursive API calls.
  • Based on our previous experiences, the REST API tends to be more error-prone, primarily due to rate limit errors.
  • In the current PR, we have shifted from using the REST API to a GraphQL query to enhance efficiency and reliability.
  • However, please be aware that GraphQL does not support fetching file content to an arbitrary depth within a repository. Extending the nodes in a GraphQL query to retrieve file content beyond one level may result in throttling errors in the case if the repository has a huge set of file content. Query Reference.
  • We've addressed this by constructing a GraphQL query that retrieves file content up to one level deep and uses a recursive approach for deeper levels as needed.
  • Additionally, we've handled potential throttling errors that may occur when a repository contains a substantial amount of content by appropriately structuring our GraphQL queries.

We have updated this current PR to ensure all file content under a repository is accessible as intended.

Thanks!

@github-actions
Copy link

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Oct 22, 2023
@aminvielledebatAtBedrock
Copy link
Contributor

Hi @ParthaI ! Do you have any update on this PR ?

@github-actions github-actions bot removed the stale No recent activity has been detected on this issue/PR and it will be closed label Nov 6, 2023
@ParthaI
Copy link
Contributor Author

ParthaI commented Nov 7, 2023

@aminvielledebatAtBedrock, I appreciate your interest in the PR. Currently, This PR is under review. I'll provide you with an update once I have more information. Your patience is greatly appreciated!

@gforien
Copy link

gforien commented Dec 18, 2023

Hi @ParthaL, we are also intereset in this PR
Do you have updates on this please ?

Copy link

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Feb 16, 2024
@aminvielledebatAtBedrock
Copy link
Contributor

Hi @graza-io , could you remove the stale label please ?

We still need this new table :-)

@ParthaI ParthaI removed the stale No recent activity has been detected on this issue/PR and it will be closed label Feb 27, 2024
@ParthaI
Copy link
Contributor Author

ParthaI commented Apr 2, 2024

@aminvielledebatAtBedrock, Just an update, the PR is able to get the repo content up to one level of directories from the path entered (or repo root). We are figuring out a way to get all the details up to the n level of the directory.

Thank you for your patience!

Copy link
Contributor

@misraved misraved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthaI please take a look at the minor review comments. Thanks!!

github/table_github_repository_content.go Outdated Show resolved Hide resolved
github/table_github_repository_content.go Show resolved Hide resolved
@misraved misraved merged commit 20986e7 into main May 17, 2024
1 check passed
@misraved misraved deleted the add-github-repo-content-table branch May 17, 2024 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants