Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large customers: Code Insights results may be incomplete #62295

Open
bahrmichael opened this issue Apr 30, 2024 · 5 comments
Open

Large customers: Code Insights results may be incomplete #62295

bahrmichael opened this issue Apr 30, 2024 · 5 comments
Assignees
Labels
code-insights Issues related to the Code Insights product team/code-search Issues owned by the code search team

Comments

@bahrmichael
Copy link
Contributor

This ticket originated from a Slack message, reported by @mike-r-mclaughlin.

We’ve been struggling to get Code Insights to work at Sentry. the screenshot below is what we see in the UI. From the screenshot it seems like we hit a timeout. Which timeout is it referring to and can that be configured?

image

It’s not entirely clear which log to check. we checked searcher (no warnings/errors during the time period this was being generated) and gitserver (also no seemingly relevant warning/error messages). we did find some relevant errors/warnings in frontend

{
  "SeverityText": "ERROR",
  "Timestamp": 1712170603579550779,
  "InstrumentationScope": "Resolver.live_preview_resolver",
  "Caller": "query/streaming_query_executor.go:154",
  "Function": "github.com/sourcegraph/sourcegraph/internal/insights/query.(*StreamingRepoQueryExecutor).ExecuteRepoList",
  "Body": "repo search encountered skipped events",
  "Resource": {
    "service.name": "frontend",
    "service.version": "5.2.7",
    "service.instance.id": "da92ab9c4de7"
  },
  "Attributes": {
    "reasons": "[shard-match-limit: Not all results have been returned due to hitting a match limit. Sourcegraph has limits for the number of results returned from a line, document and repository.]",
    "query": "fork:yes archived:yes repo:.* count:21"
  }
}

Logs: https://sourcegraph.slack.com/files/U01H4LDHVMF/F0703Q1EU2X/sourcegraph-frontend-0_--tail_100.txt

@bahrmichael
Copy link
Contributor Author

bahrmichael commented May 8, 2024

We’ve been struggling to get Code Insights to work at Sentry. the screenshot below is what we see in the UI. From the screenshot it seems like we hit a timeout. Which timeout is it referring to and can that be configured?

I assume that the datapoints come from the backfill process.

Unfortunately this is a bit hard to pin point, since there are multiple timeouts. The one that we're seeing is most likely coming from a downstream service, or an error falsely being interpreted as a timeout. When I finished cloning chromium I should be able to reproduce some timeout errors and find the right places in the code that are causing this.

It’s not entirely clear which log to check. we checked searcher (no warnings/errors during the time period this was being generated) and gitserver (also no seemingly relevant warning/error messages). we did find some relevant errors/warnings in frontend

From what I'm seeing locally, the logs should be in the worker with the scope worker.insights-job.background.Handle. Here I can see error messages like "error during backfill execution", "insights backfill progress handler loaded", "backfill set to completed state", "backfill disabled due to exceeding error threshold", and others. You need to set the log level to INFO to see all of them.

A log line like the one below should tell us which downstream this error may be coming from.

[         worker] ERROR .../backfill_state_inprogress_handler.go:206 error during backfill execution ... "handle": {"seriesId": 12, "seriesUniqueId": "2gBFsFt6a1A5DmoRJTbJKLfsCex", "backfillId": 6, "totalDuration": "3m41.207475831s", "repoTotalCount": 17, "errorCount": 8, "percentComplete": 0.8235294117647058, "erroredRepos": 1, "error": "searchHandler: streaming search: errors: [rpc error: code = Unknown desc = hybrid search failed: iterating over changed files in git diff: rpc error: code = Unknown desc = failed to get changed file: encountered unknown file status \"T\" for file \"internal/telemetrygateway/v1/telemetrygateway.proto\"]\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled\nrateLimiter.Wait: context canceled"}}

Based on the line below from your logs I guess that the search limits are stopping us here. They are not configurable in the currently released version.

{"SeverityText":"ERROR","Timestamp":1712170603579550779,"InstrumentationScope":"Resolver.live_preview_resolver","Caller":"query/streaming_query_executor.go:154","Function":"github.com/sourcegraph/sourcegraph/internal/insights/query.(*StreamingRepoQueryExecutor).ExecuteRepoList","Body":"repo search encountered skipped events","Resource":{"service.name":"frontend","service.version":"5.2.7","service.instance.id":"da92ab9c4de7"},"Attributes":{"reasons":"[shard-match-limit: Not all results have been returned due to hitting a match limit. Sourcegraph has limits for the number of results returned from a line, document and repository.]","query":"fork:yes archived:yes repo:.* count:21"}}

I'm curious about the count:21, and if the query is the full query we're using in the code insight. I've seen some places in the code where it checks against the given limit, and aborts if there are more entries.

What still sounds odd to me is that shard-match-limit would be translated to a timeout.

@bahrmichael bahrmichael changed the title Sentry: Code Insights results may be incomplete Large customers: Code Insights results may be incomplete May 8, 2024
@camdencheek
Copy link
Member

hybrid search failed: iterating over changed files in git diff: rpc error: code = Unknown desc = failed to get changed file: encountered unknown file status "T" for file "internal/telemetrygateway/v1/telemetrygateway.proto"

This seems like a bug. It looks like we only recognize file statuses A, M, and D, but according to man git-diff-tree, there are various other valid statuses.

       Possible status letters are:
       •   A: addition of a file
       •   C: copy of a file into a new one
       •   D: deletion of a file
       •   M: modification of the contents or mode of a file
       •   R: renaming of a file
       •   T: change in the type of the file (regular file, symbolic link or
           submodule)
       •   U: file is unmerged (you must complete the merge before it can be
           committed)
       •   X: "unknown" change type (most probably a bug, please report it)

@bahrmichael
Copy link
Contributor Author

FYI @camdencheek I copied the previous text to a local file as I iterate on the problem and get closer to the root cause. Just in case you're wondering where it just went.

@camdencheek
Copy link
Member

Thanks! I went ahead and poked the Source team here about the diff iterator bug

@bahrmichael
Copy link
Contributor Author

To help with excluding repos that cause problems I added this issue: #62578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-insights Issues related to the Code Insights product team/code-search Issues owned by the code search team
Projects
None yet
Development

No branches or pull requests

2 participants