Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

native histograms: experimental cardinality api #8008

Merged
merged 16 commits into from May 16, 2024

Conversation

krajorama
Copy link
Contributor

@krajorama krajorama commented Apr 30, 2024

What this PR does

Query-frontend, querier: new experimental /cardinality/active_native_histogram_metrics API to get active native histogram metric names with statistics about active native histogram buckets.

Which issue(s) this PR fixes or relates to

Fixes #7981

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@krajorama krajorama force-pushed the krajo/native-histograms-cardinality-api branch from 8539e11 to e19a6b2 Compare April 30, 2024 07:26
@krajorama krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from 220f25c to 73ef049 Compare May 3, 2024 08:24
Base automatically changed from krajo/native-histograms-cardinality-api to main May 3, 2024 09:52
@krajorama krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from 73ef049 to 000c750 Compare May 3, 2024 12:45
@krajorama
Copy link
Contributor Author

krajorama commented May 3, 2024

Regeression benchmark for querier

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/querier
cpu: AMD Ryzen 7 4700U with Radeon Graphics         
                                │   old.txt   │            new.txt            │
                                │   sec/op    │   sec/op     vs base          │
ActiveSeriesHandler_ServeHTTP-8   258.7µ ± 3%   253.8µ ± 4%  ~ (p=0.075 n=10)

                                │   old.txt    │            new.txt             │
                                │     B/op     │     B/op      vs base          │
ActiveSeriesHandler_ServeHTTP-8   146.4Ki ± 0%   146.3Ki ± 0%  ~ (p=0.353 n=10)

                                │  old.txt   │           new.txt            │
                                │ allocs/op  │ allocs/op   vs base          │
ActiveSeriesHandler_ServeHTTP-8   115.0 ± 0%   115.0 ± 0%  ~ (p=1.000 n=10)

@krajorama krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from 000c750 to b0a0626 Compare May 3, 2024 17:26
The new endpoint gives a list of metric names of active native
histogram series with associated statistics about active bucket counts.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama force-pushed the krajo/native-histograms-cardinality-api-querier branch from b0a0626 to 815d55d Compare May 3, 2024 17:29
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama changed the title Krajo/native histograms cardinality api querier native histograms: experimental cardinality api May 6, 2024
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama marked this pull request as ready for review May 6, 2024 13:48
@krajorama krajorama requested a review from a team as a code owner May 6, 2024 13:48
Copy link
Contributor

@flxbk flxbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few minor questions.

Comment on lines 2136 to 2140
// For native histograms we are going to return metrics level information so we don't need to limit the response size.
if !nativeHistograms {
if limit := d.limits.ActiveSeriesResultsMaxSizeBytes(tenantID); limit > 0 {
maxResponseSize = limit
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we apply the limit selectively only? It's there to bound in-use memory per call and querier, if it is exceeded the request should be retried with a higher shard count. Maybe I'm missing something, but I don't see why the shape of the response data makes any difference?

Copy link
Contributor Author

@krajorama krajorama May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure about this one. My problem is the name of the limit , it's called querier.active-series-results-max-size-bytes , but we'd be limiting the intermediate result size. Something more relevant could be querier.max-fetched-series-per-query, but that's for queries which have very different semantics than active series requests.

I'll restore it, add a test and update the help text to try and make it clear how it's used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, CI running now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, I'm not sure if the new help text is clearer. What exactly is your issue with
Maximum size of an active series request result shard in bytes?

but we'd be limiting the intermediate result size

that's exactly what we want though to avoid high in-use space in queriers if the shard count is too low

Copy link
Contributor Author

@krajorama krajorama May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, I'm not sure if the new help text is clearer. What exactly is your issue with
Maximum size of an active series request result shard in bytes?

I think I just missed the word "shard" in it. I'll update to just add active native histogram metrics to the existing sentence.

pkg/distributor/distributor.go Show resolved Hide resolved
pkg/distributor/distributor.go Show resolved Hide resolved
pkg/distributor/distributor_test.go Outdated Show resolved Hide resolved
Comment on lines 140 to 141
// Cannot start streaming until we merged all results.
err := g.Wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we build the full response in memory then we don't need streaming at all, or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but on the other hand stream writing JSON is more efficient as well as far as I know? So it made sense to adopt what worked already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the efficiency. In general I think the non-streaming version is more robust because if there's any error during accumulation of the response the client will get a non-200 HTTP status code whereas with streaming the client will always get a 200 and if the response can't be built without error the stream will abort. Also, there would be less code complexity without the streaming. But since we've already got that complexity elsewhere maybe it's ok to duplicate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the error handling is more complicated in my case. That convinced me, I'll change to non streaming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked it. I also switched from supporting "x-snappy-framed" to supporting "snappy" as encoding, since framing is for streaming and we don't need it here.

krajorama and others added 4 commits May 8, 2024 11:54
Co-authored-by: Felix Beuke <felix.j.beuke@gmail.com>
From review comment.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama requested a review from jdbaldry as a code owner May 10, 2024 06:31
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Contributor

@flxbk flxbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm

@krajorama krajorama merged commit 30c5fbf into main May 16, 2024
29 checks passed
@krajorama krajorama deleted the krajo/native-histograms-cardinality-api-querier branch May 16, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an API to fetch information about native histogram active buckets
2 participants