feat: incomplete datapoints can now resolve the affected repositories #62756

bahrmichael · 2024-05-17T09:41:41Z

Previously with our GraphQL api, you couldn't figure out which repositories caused incomplete datapoints. With this change you can now provide an argument to the incompleteDatapoints to not aggregate points for repositories, and then resolve the repositories for each datapoint.

This PR is needed to help debug incomplete datapoints in Code Insights. When customers create Code Insights for a large number of repositories, it's hard to understand how big the impact of incomplete datapoints is, and which repositories those issues are coming from. If you don't have access to the logs it's basically impossible to isolate problematic repositories.

Queries work as before, when you don't add the aggregateRepositories=false parameter or resolve the repository.

When you add the aggregateRepositories=false parameter and resolve the repository, you get individual datapoints for each repository that had a problem.

If you set aggregateRepositories=true and attempt to resolve the repository, it will be null.

Test plan

Existing code paths are covered by CI
I will add more tests if this approach is accepted

bahrmichael · 2024-05-17T11:57:55Z

The license check is a known issue: https://sourcegraph.slack.com/archives/C04MYFW01NV/p1715937672950199

cmd/frontend/graphqlbackend/insights.graphql

internal/insights/store/store.go

camdencheek · 2024-05-24T13:32:37Z

cmd/frontend/graphqlbackend/insights.graphql

+        By default, incomplete datapoints are aggregated across all repositories.
+        Setting this to false will allow resolving the repository.
+        """
+        aggregateRepositories: Boolean = true


Q: now that repositories is an array, do we still need this parameter? If a client doesn't care about the repository list, they can just exclude that from the list of fields in their query. Excluding this also removes the (documented but still maybe surprising) dependency between the repositories field and this argument

Yes! Thank you for the reminder. I was able to clean it up, and things seem to work as expected. Since I can't find any problematic to the store method, it should be good as long as CI passes.

camdencheek · 2024-05-24T13:38:38Z

internal/insights/store/store.go

+			if repoId.Valid {
+				mappedRepoIds[i] = int(repoId.Int64)
+			}


Q: the DB schema says repo_id is nullable, but that's kinda surprising to me. Do you understand why that is?

I found #45282 which inserts null here and mentions global queries. The repoId and repoName should be available though, based on the types that this incomplete insert runs on. I haven't found any places where the repoId and repoName on RecordSeriesPointArgs are not set. Maybe it's to reduce the number of inserts for global queries?

In the backend documentation it sounds like there should also be repo information no matter if it's global or not.

sourcegraph/doc/dev/background-information/insights/backend.md

Lines 144 to 145 in d4a6b27

Its job is to periodically schedule a recording of 'current' values for Insights by enqueuing a recording using a global query. This only requires a single global query per insight regardless of the number of repositories,

and will return results for all the matched repositories. Each matched repository will still be recorded individually.

Good find!

Not blocking, just thinking out loud to try to understand this better. What is a global code insight? It kinda makes sense that a global job wouldn't have a repo ID because it's running against everything, but when would we do that? Maybe there's a special case for an insight that only runs against public repositories, so we know that all users can view all the data, and don't need to keep track of which repo the points are for?

This PR updates the documentation to explain how users can use a new GraphQL field introduced with sourcegraph/sourcegraph#62756 to identify repositories that cause incomplete datapoints. For sourcegraph/sourcegraph#62295 ## Pull Request approval Although pull request approval is not enforced for this repository in order to reduce friction, merging without a review will generate a ticket for the docs team to review your changes. So if possible, have your pull request approved before merging.

feat: incomplete datapoints can now resolve the affected repository

abba7d5

cla-bot bot added the cla-signed label May 17, 2024

bahrmichael requested review from camdencheek and mike-r-mclaughlin May 17, 2024 09:42

bahrmichael added 2 commits May 17, 2024 11:45

fix: run bazel configure

3caad37

fix: bazel test failure

ab01c4f

bahrmichael added 2 commits May 21, 2024 16:40

Merge branch 'main' into bahrmichael/62578

25b1986

Merge branch 'main' into bahrmichael/62578

f23fe84

camdencheek reviewed May 22, 2024

View reviewed changes

cmd/frontend/graphqlbackend/insights.graphql Outdated Show resolved Hide resolved

camdencheek reviewed May 22, 2024

View reviewed changes

internal/insights/store/store.go Outdated Show resolved Hide resolved

bahrmichael and others added 3 commits May 24, 2024 12:23

feat: switch from n*m to n datapoints

29369c9

chore: add tests

5b7ecd0

Merge branch 'main' into bahrmichael/62578

98f7f87

camdencheek approved these changes May 24, 2024

View reviewed changes

bahrmichael and others added 5 commits May 24, 2024 16:14

chore: run bazel configure

1c4682f

chore: remove aggregation parameter

beeeebc

Merge branch 'main' into bahrmichael/62578

3f81e6e

fix: update tests

4d950d6

chore: add changelog entry

17ab32d

bahrmichael enabled auto-merge (squash) May 27, 2024 10:27

bahrmichael changed the title ~~feat: incomplete datapoints can now resolve the affected repository~~ feat: incomplete datapoints can now resolve the affected repositories May 27, 2024

bahrmichael merged commit ff9ef6f into main May 27, 2024
9 checks passed

bahrmichael deleted the bahrmichael/62578 branch May 27, 2024 10:32

bahrmichael mentioned this pull request May 28, 2024

feat: explain repository identification for incomplete datapoints sourcegraph/docs#357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: incomplete datapoints can now resolve the affected repositories #62756

feat: incomplete datapoints can now resolve the affected repositories #62756

bahrmichael commented May 17, 2024 •

edited

bahrmichael commented May 17, 2024

camdencheek May 24, 2024

bahrmichael May 24, 2024

camdencheek May 24, 2024

bahrmichael May 24, 2024

camdencheek May 24, 2024

	Its job is to periodically schedule a recording of 'current' values for Insights by enqueuing a recording using a global query. This only requires a single global query per insight regardless of the number of repositories,
	and will return results for all the matched repositories. Each matched repository will still be recorded individually.

feat: incomplete datapoints can now resolve the affected repositories #62756

feat: incomplete datapoints can now resolve the affected repositories #62756

Conversation

bahrmichael commented May 17, 2024 • edited

Test plan

bahrmichael commented May 17, 2024

camdencheek May 24, 2024

Choose a reason for hiding this comment

bahrmichael May 24, 2024

Choose a reason for hiding this comment

camdencheek May 24, 2024

Choose a reason for hiding this comment

bahrmichael May 24, 2024

Choose a reason for hiding this comment

camdencheek May 24, 2024

Choose a reason for hiding this comment

bahrmichael commented May 17, 2024 •

edited