Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose promextra.CounterVec / promextra.GaugeVec to reduce allocations #6193

Closed
wants to merge 2 commits into from

Conversation

colega
Copy link
Contributor

@colega colega commented Oct 2, 2023

What this PR does

Each time we call WithLabelValues(...) we have to allocate a slice for that single argument that is immediately thrown away.

These allocations can't be seen on the profiling because it's the caller, not the WithLabelValues who performs an allocation, so they're spread across the entire codebase.

I propose to replace that in our hot paths by a wrapper that would reuse that slice through a pool. I replaced the usages in pkg/ingester/metrics.go as an example. Other usages would be done in follow up PRs. Most of our usages only need to pass one or two labels, so I did just that.

I didn't embed prometheus.CounterVec on purpose: this way it's easier to refactor without missing calls unchanged. I'm also leaving that field exported on purpose, as some usages may still need to access other methods.

I didn't want to spend too much time on naming, I'm open for discussion on that promvec.Counter could be another alternative.

Which issue(s) this PR fixes or relates to

None, just passing by.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Each time we call WithLabelValues(...) we have to allocate a slice for
that single argument that is immediately thrown away.

I propose to replace that in our hot paths by a wrapper that would
reuse that slice through a pool.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega requested a review from a team as a code owner October 2, 2023 17:21
@colega colega changed the title Propose promextra.CounterVec / GaugeVec Propose promextra.CounterVec / promextra.GaugeVec to reduce allocations Oct 2, 2023
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against this change but I think the canonical way to solve this problem is getting the WithLabelValues() reference once and then just use it over the time: have you considered just grabbing the per-tenant metrics when creating userTSDB and storing the references there?

I don't know how much memory may increase, but let's say we have 500 tenants / ingester, 50 metrics per tenant, and each metric is 1KB (which I guess it's an overestimation): it's +25MB / ingester, which I guess won't be even noticeable.

@colega
Copy link
Contributor Author

colega commented Oct 2, 2023

Right, I guess I chose a bad example (I took a random file) for this. You're right, we should do that for userTSDB metrics where possible, as we could also make the lookup only once.

However, there are many places where this can't be applied.

@colega colega closed this Oct 2, 2023
@colega colega reopened this Oct 2, 2023
@pstibrany
Copy link
Member

Does it really help, or are you fighting the battle that compiler already optimizes? Where are the benchmarks?

@pstibrany
Copy link
Member

I'm not against this change

I am, unless it's provably and considerably faster. I don't think we need to make our codebase more complicated for miniscule improvements.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega
Copy link
Contributor Author

colega commented Oct 3, 2023

My opinion is that this change isn't more complicated as we're just wrapping with something that is more performant, but used in a similar way.

This can't be optimized by the compiler (at least at this point) as there's no way to know where that slice ends up.

Anyway, I hopelessly ran (because in the huge amount of work that Ingester.Push does, it's hard to find the impact of 1-2 less allocations) and I was surprised by the results. I even had to run it 50 times to make sure the result is consistent:

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
                │ /tmp/BenchmarkIngesterPush.old50 │  /tmp/BenchmarkIngesterPush.new50  │
                │              sec/op              │   sec/op     vs base               │
IngesterPush-16                        9.055µ ± 1%   8.982µ ± 1%  -0.81% (p=0.025 n=50)

                │ /tmp/BenchmarkIngesterPush.old50 │ /tmp/BenchmarkIngesterPush.new50 │
                │               B/op               │       B/op         vs base       │
IngesterPush-16                       1.064Ki ± 0%        1.033Ki ± 0%  -2.94% (n=50)

                │ /tmp/BenchmarkIngesterPush.old50 │ /tmp/BenchmarkIngesterPush.new50 │
                │            allocs/op             │     allocs/op      vs base       │
IngesterPush-16                         26.00 ± 0%          24.00 ± 0%  -7.69% (n=50)

I also ran Benchmark_Ingester_PushOnError getting similar results:

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
                │ /tmp/Benchmark_Ingester_PushOnError.old │ /tmp/Benchmark_Ingester_PushOnError.new │
                │                 sec/op                  │        sec/op          vs base          │
IngesterPush-16                               9.046µ ± 2%             8.969µ ± 2%  ~ (p=0.579 n=10)

                │ /tmp/Benchmark_Ingester_PushOnError.old │ /tmp/Benchmark_Ingester_PushOnError.new │
                │                  B/op                   │       B/op        vs base               │
IngesterPush-16                              1.064Ki ± 0%       1.033Ki ± 0%  -2.94% (p=0.000 n=10)

                │ /tmp/Benchmark_Ingester_PushOnError.old │ /tmp/Benchmark_Ingester_PushOnError.new │
                │                allocs/op                │    allocs/op      vs base               │
IngesterPush-16                                26.00 ± 0%         24.00 ± 0%  -7.69% (p=0.000 n=10)

@colega
Copy link
Contributor Author

colega commented Oct 3, 2023

I think the canonical way to solve this problem is getting the WithLabelValues() reference once

I took a quick look at doing this:

  • It's not an easy change because of all the userTSDB lifecycle. Can be done, but has to be made with extreme caution to avoid keeping references to deleted metrics.
  • Some WithLabelValues calls would still stay: active trackers names, discarded samples groups, etc.

@colega
Copy link
Contributor Author

colega commented Oct 4, 2023

@pstibrany pointed out that this should be optimized by the compiler, and @bboreham pointed to how it can be fixed, so this is superseded by the fix in prometheus/client_golang#1360

@colega colega closed this Oct 4, 2023
@colega colega deleted the propose-promextra-vecs branch October 4, 2023 16:29
@pstibrany
Copy link
Member

@bboreham pointed to how it can be fixed, so this is superseded by the fix in prometheus/client_golang#1360

That's quite interesting fix, very nice work!

@colega
Copy link
Contributor Author

colega commented Oct 5, 2023

I tried benchmarking that fix vendored to Mimir and 🚀 👍

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/ingester
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
                │   old.txt   │              new.txt               │
                │   sec/op    │   sec/op     vs base               │
IngesterPush-16   7.508µ ± 2%   7.394µ ± 1%  -1.53% (p=0.000 n=10)

                │  old.txt   │              new.txt               │
                │    B/op    │    B/op     vs base                │
IngesterPush-16   960.5 ± 0%   831.0 ± 0%  -13.48% (p=0.000 n=10)

                │  old.txt   │              new.txt               │
                │ allocs/op  │ allocs/op   vs base                │
IngesterPush-16   18.00 ± 0%   10.00 ± 0%  -44.44% (p=0.000 n=10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants