Optimize PrometheusCollector scrape efficiency. #2974

bwplotka · 2021-10-22T15:24:14Z

This might end up as some reusable framework for CRI and others, but for now attempting to optimize cadvisor in a bespoke fashion.

CPU profile: https://share.polarsignals.com/e46845c/
Memory profile: https://share.polarsignals.com/9cb2d48/

Related to prometheus/client_golang#917

Signed-off-by: Bartlomiej Plotka bwplotka@gmail.com

k8s-ci-robot · 2021-10-22T15:24:22Z

Hi @bwplotka. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Creatone · 2021-10-22T15:52:37Z

/ok-to-test

bwplotka · 2021-10-24T20:48:12Z

Results so far:

./benchstat -delta-test=none ../_dev/client_golang/bench/out_base1.txt ../_dev/client_golang/bench/out_new3.txt
name                                         old time/op    new time/op    delta
PrometheusCollector_Collect/AlwaysUpdate-12    17.0ms ± 0%     5.2ms ± 0%  -69.32%

name                                         old alloc/op   new alloc/op   delta
PrometheusCollector_Collect/AlwaysUpdate-12    8.31MB ± 0%    0.95MB ± 0%  -88.58%

name                                         old allocs/op  new allocs/op  delta
PrometheusCollector_Collect/AlwaysUpdate-12      208k ± 0%       25k ± 0%  -87.85%

/benchstat -delta-test=none ../_dev/client_golang/bench/out_base1.txt ../_dev/client_golang/bench/out_new3.txt
name                                   old time/op    new time/op    delta
PrometheusCollector_Collect/Cached-12    17.0ms ± 0%     1.2ms ± 0%  -93.14%

name                                   old alloc/op   new alloc/op   delta
PrometheusCollector_Collect/Cached-12    8.31MB ± 0%    0.00MB ± 0%  -99.94%

name                                   old allocs/op  new allocs/op  delta
PrometheusCollector_Collect/Cached-12      208k ± 0%        0k ± 0%  -99.96%

NOTE: This is missing some validity checks and single flight on HTTP handler for full experience. It also shows how fast/efficient we can get this.

Yet I have already a bit better idea on a bit better abstraction that can be similarly efficient if not better but more readable.

bwplotka · 2021-10-24T20:48:39Z

This is for 20 containers, 5k metrics.

bwplotka · 2021-10-24T20:49:40Z

cc @smarterclayton @beorn7 @dgrisonnet

smarterclayton · 2021-10-25T14:17:26Z

Very nice. There are definitely a lot of very large collectors out there that are transforming from one representation to another where series changes are a small fraction of the total set each scrape - encouraging to see the discussion towards a pattern there (apiserver in kube, haproxy/other-proxy -> source format)

bwplotka · 2021-10-26T12:57:17Z

metrics/prometheus.go

 	for _, cont := range containers {
-		values := make([]string, 0, len(rawLabels))
-		labels := make([]string, 0, len(rawLabels))
+		values := values[:0]


Thanks to the fact I did talk about this https://twitter.com/bwplotka/status/1452971992324390918?t=Ufm9UR3YGf-ud8JimO-tZw&s=19 Ivan from audience found this should reuse variable not create another one! ((:

derekwaynecarr · 2021-10-28T17:29:40Z

@bwplotka this is very promising to see the delta!

ehashman · 2021-10-28T20:04:27Z

metrics/prometheus.go

+// * returns cached results if until time expires.
+//
+// It implements prometheus.rawCollector.
+func (c *PrometheusCollector) Collect() []*dto.MetricFamily {


This causes a compilation failure against Kubernetes:

+++ [1028 13:00:49] Building go targets for linux/amd64: ./pkg/kubelet # k8s.io/kubernetes/pkg/kubelet/server pkg/kubelet/server/server.go:383:50: cannot use "k8s.io/kubernetes/vendor/github.com/google/cadvisor/metrics".NewPrometheusCollector(prometheusHostAdapter{...}, containerPrometheusLabelsFunc(s.host), includedMetrics, "k8s.io/kubernetes/vendor/k8s.io/utils/clock".RealClock{}) (type *"k8s.io/kubernetes/vendor/github.com/google/cadvisor/metrics".PrometheusCollector) as type prometheus.Collector in argument to r.RawMustRegister: *"k8s.io/kubernetes/vendor/github.com/google/cadvisor/metrics".PrometheusCollector does not implement prometheus.Collector (wrong type for Collect method) have Collect() []*io_prometheus_client.MetricFamily want Collect(chan<- prometheus.Metric)

Right. This is because we need to use this collector in new promhttp.TransactionalHandler similar how i implemented it on cadvisor. Looks like this has to be changed on kubelet too

I did a bunch of digging and the fix won't be quite as trivial or straightforward as it was in this PR. This is because in k8s interaction with Prometheus client libraries goes through the component-base metrics libraries, and it's not clear to me what the best way forward is. I pinged @logicalhan to get some feedback.

xref:

https://github.com/kubernetes/kubernetes/blob/70e7876bef5815284b43bb1fe20b55453899d721/pkg/kubelet/server/server.go#L360

https://github.com/kubernetes/kubernetes/blob/47c63a39ed6866c1dc45eb9ed6c4a78fec8474c6/staging/src/k8s.io/component-base/metrics/registry.go#L114-L116

Annoyingly, while it would be nice to stop using k8s' deprecated RawMustRegister, it looks like that is still used in a couple of other places, so we can't just get rid of it and define something better for this use case:

https://cs.k8s.io/?q=RawMustRegister&i=nope&files=&excludeFiles=&repos=kubernetes/kubernetes

dgrisonnet · 2021-10-29T08:41:50Z

@bwplotka I wonder if we could still make some performance improvements here by using a notification-based approach for updating the cache. As far as I understand your current approach, is a session-based one that regenerates all the metrics during each session collection, but more efficiently than previously because of the existence of a cache. However, we can expect that most of the container stats will not change between 2 collections, so we could save a bit of CPU by only updating the cache whenever a new change in a container occurs. I am not knowledgeable about cadvisor codebase so I don't know if that would be feasible, but as far as I can tell, container watchers seem to exist in the codebase:

cadvisor/manager/manager.go

Lines 267 to 268 in 0549d48

    
           containerWatchers        []watcher.ContainerWatcher 
        
           eventsChannel            chan watcher.ContainerEvent

This is the current approach that kube-state-metrics is taking as it watches events on Kubernetes objects and update its metric cache only when an object is created/updated/deleted.

So my suggestion would be to have two different kinds of cache collectors in client_golang. The first one session-based that you already proposed in prometheus/client_golang#929, which is very suitable for projects that don't have the possibility to watch for events. If node_exporter isn't using inotfiy watches on the sysfs/procfs, it might be a potential user of this kind of collector. The other type would be even more performant but would require the existence of a notification system and would be able to update the cache directly whenever a new event requiring metrics update is detected. This approach would have direct access to the cache and would lock it during collection, so it wouldn't have to share two different stores, as the session-based approach currently does.
Mixing both approaches could also maybe be an option?

Let me know what are you thoughts on that, just some ideas I am throwing in.

metrics/prometheus.go

ehashman · 2021-11-25T00:50:04Z

metrics/prometheus.go

+		errorsGauge := 0
+		if err := c.collectVersionInfo(session); err != nil {
+			errorsGauge = 1
+			klog.Warningf("Couldn't get version info: %s", err)
+		}
+		if err := c.collectContainersInfo(session); err != nil {
+			errorsGauge = 1
+			klog.Warningf("Couldn't get containers: %s", err)
+		}

-// Describe describes all the metrics ever exported by cadvisor. It
-// implements prometheus.PrometheusCollector.
-func (c *PrometheusCollector) Describe(ch chan<- *prometheus.Desc) {
-	c.errors.Describe(ch)
-	for _, cm := range c.containerMetrics {
-		ch <- cm.desc([]string{})
+		session.MustAddMetric(
+			"container_scrape_error", "1 if there was an error while getting container metrics, 0 otherwise",
+			nil, nil, prometheus.GaugeValue, float64(errorsGauge), nil,


Spent some time today trying to get this working properly in k8s.

I am finding that I keep running into errors... If I don't set IdType: cadvisorv2.TypeName in the cadvisor options, I get

ehashman@fedora:~/src/k8s$ curl http://127.0.0.1:8001/api/v1/nodes/127.0.0.1/proxy/metrics/cadvisor # HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision. # TYPE cadvisor_version_info gauge cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="",kernelVersion="5.10.0-9-amd64",osVersion="Debian GNU/Linux 11 (bullseye)"} 1 # HELP container_scrape_error 1 if there was an error while getting container metrics, 0 otherwise # TYPE container_scrape_error gauge container_scrape_error 1

and error in the logs

W1124 16:45:01.711801 1279370 prometheus.go:1815] Couldn't get containers: invalid request type ""

If I do set it, I get

ehashman@fedora:~/src/k8s$ curl http://127.0.0.1:8001/api/v1/nodes/127.0.0.1/proxy/metrics/cadvisor { "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "error trying to reach service: EOF", "reason": "ServiceUnavailable", "code": 503 }

Not sure what I'm doing wrong.

k8s WIP PR is kubernetes/kubernetes#106334

And the patch I'm applying on top of the PR to get to that second error case is

diff --git a/pkg/kubelet/server/server.go b/pkg/kubelet/server/server.go index ce3fc942e49..be9cb5e2603 100644 --- a/pkg/kubelet/server/server.go +++ b/pkg/kubelet/server/server.go @@ -378,7 +378,16 @@ func (s *Server) InstallDefaultHandlers() { includedMetrics[cadvisormetrics.AcceleratorUsageMetrics] = struct{}{} } - br.MustRegisterRaw(metrics.NewPrometheusCollector(prometheusHostAdapter{s.host}, containerPrometheusLabelsFunc(s.host), includedMetrics, clock.RealClock{})) + cadvisorOpts := cadvisorv2.RequestOptions{ + IdType: cadvisorv2.TypeName, + Count: 1, + Recursive: true, + } + + containersCollector := metrics.NewPrometheusCollector(prometheusHostAdapter{s.host}, containerPrometheusLabelsFunc(s.host), includedMetrics, clock.RealClock{}) + containersCollector.SetOpts(cadvisorOpts) + br.MustRegisterRaw(containersCollector) + br.MustRegister(metrics.NewPrometheusMachineCollector(prometheusHostAdapter{s.host}, includedMetrics)) s.restfulCont.Handle(cadvisorMetricsPath, promhttp.HandlerForTransactional(br, promhttp.HandlerOpts{ErrorHandling: promhttp.ContinueOnError}),

bwplotka · 2022-01-27T11:34:15Z

Updated for the newest client_golang changes (prometheus/client_golang#929)

Should be ready for review. There are some observations for further improvements, but otherwise, all races should be fixed and we should see significant improvement in allocs and cpus.

TODO still:

Fix a few failing tests.
Benchmark
Run with Kubelet to and measure/ check functionally.

cc @ehashman @dgrisonnet

FYI: I have to take 2w off due to post knee surgery recovery, but anyone is welcome to continue on this!

bwplotka · 2022-01-27T11:35:29Z

Ofc @dgrisonnet said, move to pure watcher based approach would be extremely beneficial, but it needs huge changes, so keeping that off for now.

cmd/internal/http/handlers.go

ehashman · 2022-02-10T23:44:31Z

I have updated my k/k PR https://github.com/kubernetes/kubernetes/pull/107960/files#diff-c3e7f724d1b0dcee40df80716ed57d90d2649710150fa92bf9822bdad35e0429 which integrates this change + the required changes to rebase on top of runc 1.1.0.

I still am not getting any metrics... any time I try to hit the kubelet endpoint, I get an empty response and a corresponding logline of:

I0210 15:37:17.477351  121888 httplog.go:131] "HTTP" verb="GET" URI="/metrics/cadvisor" latency="91.027µs" userAgent="curl/7.74.0" audit-ID="" srcIP="127.0.0.1:51322" resp=0

bwplotka · 2022-02-15T14:17:00Z

Ack @ehashman taking a look today/tomorrow (:

``` /tmp/GoLand/___BenchmarkPrometheusCollector_Collect_in_github_com_google_cadvisor_metrics.test -test.v -test.paniconexit0 -test.bench ^\QBenchmarkPrometheusCollector_Collect\E$ -test.run ^$ -test.benchtime 10s goos: linux goarch: amd64 pkg: github.com/google/cadvisor/metrics cpu: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz BenchmarkPrometheusCollector_Collect BenchmarkPrometheusCollector_Collect-12 748 14929127 ns/op 8253796 B/op 207706 allocs/op PASS Process finished with the exit code 0 ``` Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka · 2022-02-24T13:22:37Z

This PR is ready to go IMO.

All benchmarked. With kubelet we see significantly less allocations and CPU usage (kubernetes/kubernetes#108194)

bwplotka · 2022-02-24T13:22:47Z

cc @ehashman @dgrisonnet

bwplotka · 2022-02-24T13:28:25Z

On kubelet side I see only one discrepancy between this PR and without this PR:

Worth double checking before merging. Otherwise no blocker.

bwplotka · 2022-02-24T13:30:19Z

Not sure what pull-cadvisor-e2e job fail means: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/google_cadvisor/2974/pull-cadvisor-e2e/1496837320517619712/

bwplotka · 2022-02-24T17:12:15Z

TODO: Clean up the commits.

bwplotka · 2022-03-04T13:01:09Z

metrics/cache/cache_test.go

+	dto "github.com/prometheus/client_model/go"
+)
+
+//


Done in a587512.

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

tossmilestone · 2022-10-20T02:41:24Z

@bwplotka hi, is this pr still in progress? We have encountered the probelm either.

bwplotka · 2024-05-01T09:00:20Z

Wow, sorry for lag. This work was paused and got stale. I changed jobs since then, so closing for now. Efficiency improvements are still very possible here, but ideally we redesign cadvisor metric tracking and that's a bigger work.

Something to discuss with cadvisor maintainers cc @cwangVT @SergeyKanzhelev

google-cla bot added the cla: yes label Oct 22, 2021

k8s-ci-robot added the needs-ok-to-test label Oct 22, 2021

bwplotka mentioned this pull request Oct 22, 2021

Steady state kubelet CPU usage is high due to excessive allocation in prometheus scraping of cadvisor kubernetes/kubernetes#104459

Open

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Oct 22, 2021

bwplotka force-pushed the optimize branch from c82ba64 to 1149bfc Compare October 23, 2021 02:39

bwplotka commented Oct 26, 2021

View reviewed changes

ehashman reviewed Oct 28, 2021

View reviewed changes

ehashman mentioned this pull request Oct 28, 2021

[WIP] Refactor cadvisor metrics collection openshift/kubernetes#1024

Closed

logicalhan reviewed Nov 4, 2021

View reviewed changes

metrics/prometheus.go Outdated Show resolved Hide resolved

ehashman reviewed Nov 25, 2021

View reviewed changes

bwplotka force-pushed the optimize branch from cd50a34 to 2eb6735 Compare January 27, 2022 11:30

bwplotka mentioned this pull request Jan 27, 2022

Added TransactionalGatherer with Cached implementation (+benchmark) prometheus/client_golang#929

Closed

1 task

bwplotka marked this pull request as ready for review January 27, 2022 11:34

ehashman reviewed Feb 10, 2022

View reviewed changes

cmd/internal/http/handlers.go Show resolved Hide resolved

bwplotka added 3 commits February 16, 2022 17:12

Optimization attempt 1.

3516134

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Optimization attempt 2.

44415ce

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka mentioned this pull request Feb 17, 2022

[WIP] kubelet: Improved performance of the cadvisor metric collection kubernetes/kubernetes#108194

Closed

3 tasks

bwplotka added 13 commits February 23, 2022 11:23

Moved cached here from client_golang for tied integration.

bcb7e26

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Embedded cache, fixes and optimizations.

8de935d

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Optimized.

66021d8

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Optimization.

d5a22e0

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

fix.

51fa483

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

tmp.

c622498

Most optimized version.

d5e8761

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Adjusted for new type.

20dc825

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Adjusted tests.

ecaea89

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Fixed sort of expected thing (was wrong) and data errors.

a880892

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Simplified.

c6ed5c1

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Fixes.

de13093

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

cleaner syntax.

31697c9

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka commented Mar 4, 2022

View reviewed changes

Fix deps & uncomment cache tests

a587512

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

google-cla bot added cla: no cla: yes and removed cla: yes cla: no labels Mar 9, 2022

Add file boilerplate

b2bec41

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

simonpasquier mentioned this pull request Nov 25, 2022

Kepler general energy metrics vs energy tracing metrics sustainable-computing-io/kepler#365

Closed

bwplotka closed this May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize PrometheusCollector scrape efficiency. #2974

Optimize PrometheusCollector scrape efficiency. #2974

bwplotka commented Oct 22, 2021 •

edited

k8s-ci-robot commented Oct 22, 2021

Creatone commented Oct 22, 2021

bwplotka commented Oct 24, 2021

bwplotka commented Oct 24, 2021

bwplotka commented Oct 24, 2021

smarterclayton commented Oct 25, 2021

bwplotka Oct 26, 2021

derekwaynecarr commented Oct 28, 2021

ehashman Oct 28, 2021

bwplotka Nov 2, 2021

ehashman Nov 3, 2021

dgrisonnet commented Oct 29, 2021

ehashman Nov 25, 2021

bwplotka commented Jan 27, 2022

bwplotka commented Jan 27, 2022

ehashman commented Feb 10, 2022

bwplotka commented Feb 15, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka Mar 4, 2022

saswatamcode Mar 9, 2022

tossmilestone commented Oct 20, 2022

bwplotka commented May 1, 2024

Optimize PrometheusCollector scrape efficiency. #2974

Optimize PrometheusCollector scrape efficiency. #2974

Conversation

bwplotka commented Oct 22, 2021 • edited

k8s-ci-robot commented Oct 22, 2021

Creatone commented Oct 22, 2021

bwplotka commented Oct 24, 2021

bwplotka commented Oct 24, 2021

bwplotka commented Oct 24, 2021

smarterclayton commented Oct 25, 2021

bwplotka Oct 26, 2021

Choose a reason for hiding this comment

derekwaynecarr commented Oct 28, 2021

ehashman Oct 28, 2021

Choose a reason for hiding this comment

bwplotka Nov 2, 2021

Choose a reason for hiding this comment

ehashman Nov 3, 2021

Choose a reason for hiding this comment

dgrisonnet commented Oct 29, 2021

ehashman Nov 25, 2021

Choose a reason for hiding this comment

bwplotka commented Jan 27, 2022

bwplotka commented Jan 27, 2022

ehashman commented Feb 10, 2022

bwplotka commented Feb 15, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka commented Feb 24, 2022

bwplotka Mar 4, 2022

Choose a reason for hiding this comment

saswatamcode Mar 9, 2022

Choose a reason for hiding this comment

tossmilestone commented Oct 20, 2022

bwplotka commented May 1, 2024

bwplotka commented Oct 22, 2021 •

edited