-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: incomplete datapoints can now resolve the affected repositories #62756
Conversation
The license check is a known issue: https://sourcegraph.slack.com/archives/C04MYFW01NV/p1715937672950199 |
By default, incomplete datapoints are aggregated across all repositories. | ||
Setting this to false will allow resolving the repository. | ||
""" | ||
aggregateRepositories: Boolean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: now that repositories
is an array, do we still need this parameter? If a client doesn't care about the repository list, they can just exclude that from the list of fields in their query. Excluding this also removes the (documented but still maybe surprising) dependency between the repositories
field and this argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thank you for the reminder. I was able to clean it up, and things seem to work as expected. Since I can't find any problematic to the store method, it should be good as long as CI passes.
if repoId.Valid { | ||
mappedRepoIds[i] = int(repoId.Int64) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: the DB schema says repo_id
is nullable, but that's kinda surprising to me. Do you understand why that is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found #45282 which inserts null here and mentions global queries. The repoId and repoName should be available though, based on the types that this incomplete insert runs on. I haven't found any places where the repoId and repoName on RecordSeriesPointArgs are not set. Maybe it's to reduce the number of inserts for global queries?
In the backend documentation it sounds like there should also be repo information no matter if it's global or not.
sourcegraph/doc/dev/background-information/insights/backend.md
Lines 144 to 145 in d4a6b27
Its job is to periodically schedule a recording of 'current' values for Insights by enqueuing a recording using a global query. This only requires a single global query per insight regardless of the number of repositories, | |
and will return results for all the matched repositories. Each matched repository will still be recorded individually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
Not blocking, just thinking out loud to try to understand this better. What is a global code insight? It kinda makes sense that a global job wouldn't have a repo ID because it's running against everything, but when would we do that? Maybe there's a special case for an insight that only runs against public repositories, so we know that all users can view all the data, and don't need to keep track of which repo the points are for?
This PR updates the documentation to explain how users can use a new GraphQL field introduced with sourcegraph/sourcegraph#62756 to identify repositories that cause incomplete datapoints. For sourcegraph/sourcegraph#62295 ## Pull Request approval Although pull request approval is not enforced for this repository in order to reduce friction, merging without a review will generate a ticket for the docs team to review your changes. So if possible, have your pull request approved before merging.
Closes #62578
For #62295
Previously with our GraphQL api, you couldn't figure out which repositories caused incomplete datapoints. With this change you can now provide an argument to the
incompleteDatapoints
to not aggregate points for repositories, and then resolve the repositories for each datapoint.This PR is needed to help debug incomplete datapoints in Code Insights. When customers create Code Insights for a large number of repositories, it's hard to understand how big the impact of incomplete datapoints is, and which repositories those issues are coming from. If you don't have access to the logs it's basically impossible to isolate problematic repositories.
Queries work as before, when you don't add the
aggregateRepositories=false
parameter or resolve the repository.When you add the
aggregateRepositories=false
parameter and resolve the repository, you get individual datapoints for each repository that had a problem.If you set
aggregateRepositories=true
and attempt to resolve the repository, it will be null.Test plan