Propose `promextra.CounterVec` / `promextra.GaugeVec` to reduce allocations #6193

colega · 2023-10-02T17:21:36Z

What this PR does

Each time we call WithLabelValues(...) we have to allocate a slice for that single argument that is immediately thrown away.

These allocations can't be seen on the profiling because it's the caller, not the WithLabelValues who performs an allocation, so they're spread across the entire codebase.

I propose to replace that in our hot paths by a wrapper that would reuse that slice through a pool. I replaced the usages in pkg/ingester/metrics.go as an example. Other usages would be done in follow up PRs. Most of our usages only need to pass one or two labels, so I did just that.

I didn't embed prometheus.CounterVec on purpose: this way it's easier to refactor without missing calls unchanged. I'm also leaving that field exported on purpose, as some usages may still need to access other methods.

I didn't want to spend too much time on naming, I'm open for discussion on that promvec.Counter could be another alternative.

Which issue(s) this PR fixes or relates to

None, just passing by.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Each time we call WithLabelValues(...) we have to allocate a slice for that single argument that is immediately thrown away. I propose to replace that in our hot paths by a wrapper that would reuse that slice through a pool. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

pracucci

I'm not against this change but I think the canonical way to solve this problem is getting the WithLabelValues() reference once and then just use it over the time: have you considered just grabbing the per-tenant metrics when creating userTSDB and storing the references there?

I don't know how much memory may increase, but let's say we have 500 tenants / ingester, 50 metrics per tenant, and each metric is 1KB (which I guess it's an overestimation): it's +25MB / ingester, which I guess won't be even noticeable.

colega · 2023-10-02T20:28:17Z

Right, I guess I chose a bad example (I took a random file) for this. You're right, we should do that for userTSDB metrics where possible, as we could also make the lookup only once.

However, there are many places where this can't be applied.

pstibrany · 2023-10-03T08:07:37Z

Does it really help, or are you fighting the battle that compiler already optimizes? Where are the benchmarks?

pstibrany · 2023-10-03T08:11:07Z

I'm not against this change

I am, unless it's provably and considerably faster. I don't think we need to make our codebase more complicated for miniscule improvements.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

colega · 2023-10-03T08:45:08Z

My opinion is that this change isn't more complicated as we're just wrapping with something that is more performant, but used in a similar way.

This can't be optimized by the compiler (at least at this point) as there's no way to know where that slice ends up.

Anyway, I hopelessly ran (because in the huge amount of work that Ingester.Push does, it's hard to find the impact of 1-2 less allocations) and I was surprised by the results. I even had to run it 50 times to make sure the result is consistent:

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
                │ /tmp/BenchmarkIngesterPush.old50 │  /tmp/BenchmarkIngesterPush.new50  │
                │              sec/op              │   sec/op     vs base               │
IngesterPush-16                        9.055µ ± 1%   8.982µ ± 1%  -0.81% (p=0.025 n=50)

                │ /tmp/BenchmarkIngesterPush.old50 │ /tmp/BenchmarkIngesterPush.new50 │
                │               B/op               │       B/op         vs base       │
IngesterPush-16                       1.064Ki ± 0%        1.033Ki ± 0%  -2.94% (n=50)

                │ /tmp/BenchmarkIngesterPush.old50 │ /tmp/BenchmarkIngesterPush.new50 │
                │            allocs/op             │     allocs/op      vs base       │
IngesterPush-16                         26.00 ± 0%          24.00 ± 0%  -7.69% (n=50)

I also ran Benchmark_Ingester_PushOnError getting similar results:

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
                │ /tmp/Benchmark_Ingester_PushOnError.old │ /tmp/Benchmark_Ingester_PushOnError.new │
                │                 sec/op                  │        sec/op          vs base          │
IngesterPush-16                               9.046µ ± 2%             8.969µ ± 2%  ~ (p=0.579 n=10)

                │ /tmp/Benchmark_Ingester_PushOnError.old │ /tmp/Benchmark_Ingester_PushOnError.new │
                │                  B/op                   │       B/op        vs base               │
IngesterPush-16                              1.064Ki ± 0%       1.033Ki ± 0%  -2.94% (p=0.000 n=10)

                │ /tmp/Benchmark_Ingester_PushOnError.old │ /tmp/Benchmark_Ingester_PushOnError.new │
                │                allocs/op                │    allocs/op      vs base               │
IngesterPush-16                                26.00 ± 0%         24.00 ± 0%  -7.69% (p=0.000 n=10)

colega · 2023-10-03T09:45:15Z

I think the canonical way to solve this problem is getting the WithLabelValues() reference once

I took a quick look at doing this:

It's not an easy change because of all the userTSDB lifecycle. Can be done, but has to be made with extreme caution to avoid keeping references to deleted metrics.
Some WithLabelValues calls would still stay: active trackers names, discarded samples groups, etc.

colega · 2023-10-04T16:20:33Z

@pstibrany pointed out that this should be optimized by the compiler, and @bboreham pointed to how it can be fixed, so this is superseded by the fix in prometheus/client_golang#1360

pstibrany · 2023-10-05T06:29:14Z

@bboreham pointed to how it can be fixed, so this is superseded by the fix in prometheus/client_golang#1360

That's quite interesting fix, very nice work!

colega · 2023-10-05T07:52:02Z

I tried benchmarking that fix vendored to Mimir and 🚀 👍

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
                │   old.txt   │              new.txt               │
                │   sec/op    │   sec/op     vs base               │
IngesterPush-16   7.508µ ± 2%   7.394µ ± 1%  -1.53% (p=0.000 n=10)

                │  old.txt   │              new.txt               │
                │    B/op    │    B/op     vs base                │
IngesterPush-16   960.5 ± 0%   831.0 ± 0%  -13.48% (p=0.000 n=10)

                │  old.txt   │              new.txt               │
                │ allocs/op  │ allocs/op   vs base                │
IngesterPush-16   18.00 ± 0%   10.00 ± 0%  -44.44% (p=0.000 n=10)

colega requested a review from a team as a code owner October 2, 2023 17:21

colega changed the title ~~Propose promextra.CounterVec / GaugeVec~~ Propose promextra.CounterVec / promextra.GaugeVec to reduce allocations Oct 2, 2023

colega requested review from pracucci and bboreham October 2, 2023 17:21

pracucci reviewed Oct 2, 2023

View reviewed changes

colega closed this Oct 2, 2023

colega reopened this Oct 2, 2023

make license

11641d5

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

colega closed this Oct 4, 2023

colega deleted the propose-promextra-vecs branch October 4, 2023 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propose `promextra.CounterVec` / `promextra.GaugeVec` to reduce allocations #6193

Propose `promextra.CounterVec` / `promextra.GaugeVec` to reduce allocations #6193

colega commented Oct 2, 2023 •

edited

pracucci left a comment

colega commented Oct 2, 2023 •

edited

pstibrany commented Oct 3, 2023

pstibrany commented Oct 3, 2023

colega commented Oct 3, 2023

colega commented Oct 3, 2023

colega commented Oct 4, 2023

pstibrany commented Oct 5, 2023

colega commented Oct 5, 2023

Propose promextra.CounterVec / promextra.GaugeVec to reduce allocations #6193

Propose promextra.CounterVec / promextra.GaugeVec to reduce allocations #6193

Conversation

colega commented Oct 2, 2023 • edited

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

pracucci left a comment

Choose a reason for hiding this comment

colega commented Oct 2, 2023 • edited

pstibrany commented Oct 3, 2023

pstibrany commented Oct 3, 2023

colega commented Oct 3, 2023

colega commented Oct 3, 2023

colega commented Oct 4, 2023

pstibrany commented Oct 5, 2023

colega commented Oct 5, 2023

Propose `promextra.CounterVec` / `promextra.GaugeVec` to reduce allocations #6193

Propose `promextra.CounterVec` / `promextra.GaugeVec` to reduce allocations #6193

colega commented Oct 2, 2023 •

edited

colega commented Oct 2, 2023 •

edited