Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram unbounded memory even with run_upkeep #467

Open
gauravkumar37 opened this issue Mar 26, 2024 · 8 comments
Open

Histogram unbounded memory even with run_upkeep #467

gauravkumar37 opened this issue Mar 26, 2024 · 8 comments
Labels
C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-ergonomics Type: ergonomics. T-request Type: request.

Comments

@gauravkumar37
Copy link

Thanks for a wonderful library. After the recent merge of #460 I tested the upkeep and idle timeout configs but still see unbounded memory usage if /metrics endpoint is not being called.
Is this still expected even after upkeep?

Testing methodology:

  • Using axum with axum-prometheus
  • Using /metrics
  • Using wrk to benchmark an API with a simple health endpoint
  • Memory keep on increasing unbounded
  • Code has idle_timeout of 60s and upkeep of 5s
  • If I now open /metrics endpoint, memory eventually reduces and the more I open it periodically, the more stable the memory is

So, is my observed behavior still expected even after the introduction of upkeep?

@tobz
Copy link
Member

tobz commented Mar 27, 2024

Hmm, the latest (0.6.1) version of axum-prometheus is still on 0.13.x of metrics-exporter-prometheus. Are you using a development version or patching metrics-exporter-prometheus to 0.14.0?

@gauravkumar37
Copy link
Author

Correct, hence I patched the version of metrics-exporter-prometheus in axum-prometheus to 0.14.0 to test, also able to verify that adding upkeep_timeout compiles correctly, also verified the 0.14.0 in Cargo.lock.
Patch location: https://github.com/gauravkumar37/axum-prometheus/tree/patch-1

@tobz
Copy link
Member

tobz commented Mar 27, 2024

Looking at how axum-prometheus uses metrics-exporter-prometheus, this makes sense now: PrometheusBuilder doesn't create the upkeep task unless PrometheusBuilder::build or PrometheusBuilder::install are called, whereas right now, it calls PrometheusBuilder::install_recorder.

Not really sure there's a ton we can do here, or to be frank, that I'm willing to do here. The number of ways to build/install the recorder is already too high for comfort because it's trying to be all things to all people. The simplest thing would be if axum-prometheus switched to using PrometheusBuilder::build and discarded the exporter future if it's not in push gateway mode. It would still have to install the recorder, but it would allow it the chance to grab a recorder handle, and it would allow the upkeep task to be created.

@tobz tobz added C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-ergonomics Type: ergonomics. T-request Type: request. labels Mar 27, 2024
@gauravkumar37
Copy link
Author

This is what I am using inside axum framework:

    PrometheusMetricLayerBuilder::new()
        .with_metrics_from_fn(|| {
            PrometheusBuilder::new()
                .idle_timeout(MetricKindMask::HISTOGRAM, Some(Duration::from_secs(300)))
                .upkeep_timeout(Duration::from_secs(60))
                .set_buckets(&[0.1, 0.2, 0.3])
                .unwrap()
                .install_recorder()
                .unwrap()
        })
        .build_pair()

This does create the PrometheusBuilder and installs it. Is there something more to do here?

@tobz
Copy link
Member

tobz commented Mar 27, 2024

Right, you're using install_recorder, when you need to use build.

@gauravkumar37
Copy link
Author

gauravkumar37 commented Mar 29, 2024

Ok, I got it to work correctly but I must say, that was an unexpected nuance there :-)

So, something like this works and there is no more unbounded memory growth:

    PrometheusMetricLayerBuilder::new()
        .with_metrics_from_fn(|| {
            let (recorder, _) = PrometheusBuilder::new()
                // if metrics are not updated within this time, they will be removed
                .idle_timeout(MetricKindMask::HISTOGRAM, Some(Duration::from_secs(5 * 60)))
                // prevents unbounded memory growth by draining histogram data periodically
                .upkeep_timeout(Duration::from_secs(5))
                .set_buckets(buckets_in_secs.as_slice())
                .expect_or_log(&format!("Failed to set buckets: '{buckets_in_secs:?}'"))
                // instead of install_recorder, use build because the former doesn't start the upkeep tasks
                .build()
                .expect_or_log("Failed to build metrics recorder");
            let handle = recorder.handle();
            metrics::set_global_recorder(recorder).expect_or_log("Failed to set global recorder");
            handle
        })
        .build_pair()

++ @Ptrskay3 for changes in axum-prometheus

@Ptrskay3
Copy link

Thanks for the ping. I'm open to changing axum_prometheus, if that makes it safer for most of the people. Not sure when I'll find the time to do it myself however. Created Ptrskay3/axum-prometheus#51.

@tobz
Copy link
Member

tobz commented Mar 31, 2024

@Ptrskay3 Thanks for opening that. 👍🏻

Like I said upthread, I'm also not a fan of how thorny the crate (this crate, not yours) has become in terms of the API... but it does seem like taking the approach shown by @gauravkumar37 would be the simplest one.

I'll also make a note for myself to potentially contribute that as a PR to axum_prometheus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-ergonomics Type: ergonomics. T-request Type: request.
Projects
None yet
Development

No branches or pull requests

3 participants