Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cache): Add Cache-Control: no-cache support for Loki instant queries. #12896

Merged
merged 11 commits into from May 19, 2024

Conversation

kavirajk
Copy link
Collaborator

@kavirajk kavirajk commented May 6, 2024

What this PR does / why we need it:
Goal is, if header Cache-Control: no-cache is set, we guarantee that response user got for that query is never from the results cache.

Two changes at high-level

  1. Loki: Queryfrontend interprets Cache-Control header and bypass the results if the value is no-cache.
  2. LogCLI: Add extra flag --nocache to support setting the right header when sending queries to Loki.

Changes in Action:

  1. Normal case using results cache.
-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[3h]))' -q
[
  {
    "metric": {},
    "value": [
      1715602199.729,
      "0.035555555555555556"
    ]
  }

Hitting the results cache from the following log lines in Loki

cache_result_req=17 cache_result_hit=16
  1. Using HTTP header to bypass results caching
-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[3h]))' --nocache -q
[
  {
    "metric": {},
    "value": [
      1715602387.564,
      "0.035555555555555556"
    ]
  }

Not hitting the results cache from the following logs in Loki

cache_result_req=0 cache_result_hit=0

Which issue(s) this PR fixes:
Fixes NA

Special notes for your reviewer:

  1. This PR adds support only for instant queries now. Follow up PRs will support for all types of queries.

  2. Instant queries uses downstream engine via split_by_range middleware. Which converts LokiInstantRequest to Params and convert back from Params to LokiInstantRequest. So added support to persist cachingOptions in params as well.

  3. Since this header is only valuable for Query-frontend, we don't persist this header when passing the subqueries to the querier (via HTTP -> GRPC -> HTTP roundtrip).

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@kavirajk kavirajk force-pushed the kavirajk/aggregation-instant-query branch from 40eb3ed to 1b77ecf Compare May 6, 2024 12:10
Goal is, if that header is set, we gurantee that response user got for that query is never from results cache.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk kavirajk force-pushed the kavirajk/aggregation-instant-query branch from 8ad5cd3 to 12a591a Compare May 13, 2024 06:43
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@pull-request-size pull-request-size bot added size/M and removed size/S labels May 13, 2024
@kavirajk kavirajk changed the title feat(cache): Add support for Cache-Control: no-cache support for Loki queries. feat(cache): Add Cache-Control: no-cache support for Loki instant queries. May 13, 2024
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels May 14, 2024
@kavirajk kavirajk marked this pull request as ready for review May 14, 2024 07:06
@kavirajk kavirajk requested a review from a team as a code owner May 14, 2024 07:06
Copy link
Contributor

@ashwanthgoli ashwanthgoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just a couple of nits

rc.put(ctx, key, []Extent{mkExtent(int64(modelNow)-(600*1e3), int64(modelNow))})

// Asserts (when `shouldCacheReq` is non-nil and set to return false (should not cache), resultcache should get result from upstream handler (mockHandler))
// 1. With `shouldCacheReq` non-nil and set `noCacheReq`, should get result from `next` handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we change this shouldCacheReq to shouldLookupCache or bypassCache?
the current name is a bit confusing, it reads "whether this request should be cached"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I can change it in my tests. But the original type still called ShouldCacheReq (IMO, still confusing, because the same request's results can still be cached after getting it from the upstream).

Should ideally be called something like ShouldLookupCache. I will fix the naming on my test now. Renaming the original type I can do it in separate PR may be.


// fill cache
key := ConstSplitter(day).GenerateCacheKey(context.Background(), "1", req)
rc.put(ctx, key, []Extent{mkExtent(int64(modelNow)-(600*1e3), int64(modelNow))})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be flaky at times, because the extent might end up in the previous day when this test runs on a day boundary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use constant time range similar to other tests to keep this predictable? mkExtent(50, 120)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extent should be irrelevent here I think because, the test is make sure we are calling upstream next handler instead of looking up in the cache in the first place (independent of cache hit or cache miss).

But good idea to make it constant anyway. Will update it.

@kavirajk kavirajk merged commit 88e545f into main May 19, 2024
58 checks passed
@kavirajk kavirajk deleted the kavirajk/aggregation-instant-query branch May 19, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants