Allow to unregister/stop/destroy instruments #2232

mateuszrzeszutek · 2021-12-20T09:57:29Z

What are you trying to achieve?

I'm currently working on the micrometer->OTel bridge instrumentation in the javaagent. Micrometer offers the possibility to remove a meter from the MeterRegistry and stop emitting whatever metrics it used to collect. For example, suppose you use a database connection pool that's instrumented with metrics - when you close/destroy the whole pool you probably want to stop collecting any metrics associated to it (because it doesn't exist anymore). This is useful for both asynchronous instruments (since once they're registered there's no way to stop them) and synchronous instruments (you can just stop using them, but the metrics SDK will still send in the last recorded value).

Additional context.

Micrometer bridge PR: open-telemetry/opentelemetry-java-instrumentation#4919

CC @jsuereth

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2021-12-20T17:05:19Z

(Following https://github.com/open-telemetry/opentelemetry-specification/blob/main/CONTRIBUTING.md#issue-triaging process).

@jsuereth I am reassigning this to you, I believe you know the context of this better.

bogdandrutu · 2022-01-03T22:29:49Z

@mateuszrzeszutek micrometer is an API + Some implementation(s). I am not sure why that should translate into calls to our API instead of offering an implementation of micrometer that implements the MetricProducer, see https://github.com/open-telemetry/opentelemetry-java/blob/44dccd6c33c7c4cb58e1cc63f0db02b3bfec1804/opencensus-shim/src/main/java/io/opentelemetry/opencensusshim/metrics/OpenCensusMetricProducer.java#L24

/cc @jsuereth

anuraaga · 2022-01-04T02:06:20Z

@bogdandrutu Micrometer was just one example but do you think apps should not have a way to stop reporting metrics? Dropwizard metrics in Java also has registry.remove and it seems idiomatic for a library that allows registering a stateful object to always allow unregistering it. While the whole MeterProvider could be closed, metric definitions are generally completely separate from the MeterProvider lifecycle and that probably doesn't solve that case.

mateuszrzeszutek · 2022-01-04T08:34:10Z

There are advantages to calling OTel API in micrometer bridge instrumentation:

Micrometer instruments like Timer and DistributionSummary can be translated to native OTel histograms; micrometer does not have a "histogram" meter by itself, it basically exports each bucket as a separate gauge.
Due to the API/SDK split in the javaagent using a MetricProducer inside an instrumentation would be very painful; the SDK classes are not directly accessible in instrumentation code, they're hidden away inside the agent classloader -- we would have to bridge MetricProducer and all related classes (and do lots of copying) to make that work at all.

Also, micrometer aside, the unregister/remove functionality is still very much needed. The database connection pool example from my first post here is still valid.

anuraaga · 2022-01-04T08:44:13Z

Just another note about Micrometer, I expect it to be used for a very long time by Spring and likely Spring users too as a result. For example, for tracing they will use Micrometer Tracing, not the OpenTelemetry tracing API. I think the reality of the Java ecosystem means we should consider Micrometer as a first class API for Java metrics, and this seems OK to me in practice. As such, I'd like Micrometer usage to benefit from features like exemplars / baggage which means going through our API (or well our SDK implementations directly maybe, but not MetricProducer which is too late to add cross-cutting concern features to the micrometer metrics).

bogdandrutu · 2022-01-05T19:03:44Z

@mateuszrzeszutek

Micrometer instruments like Timer and DistributionSummary can be translated to native OTel histograms; micrometer does not have a "histogram" meter by itself, it basically exports each bucket as a separate gauge.

It is just a representation difference, it is still a Histogram, I don't see the point.

Due to the API/SDK split in the javaagent using a MetricProducer inside an instrumentation would be very painful; the SDK classes are not directly accessible in instrumentation code, they're hidden away inside the agent classloader -- we would have to bridge MetricProducer and all related classes (and do lots of copying) to make that work at all.

The micrometer instrumentation is not really an instrumentation, is actually an SDK/Producer whatever you call it, since micrometer is an API + Impl to produce metrics. Not sure if any metrics library should be seen as "instrumentation", but rather as producers of telemetry.

Just another note about Micrometer, I expect it to be used for a very long time by Spring and likely Spring users too as a result. For example, for tracing they will use Micrometer Tracing, not the OpenTelemetry tracing API. I think the reality of the Java ecosystem means we should consider Micrometer as a first class API for Java metrics, and this seems OK to me in practice.

I am not saying to not have an integration (not instrumentation) with micrometer.

As such, I'd like Micrometer usage to benefit from features like exemplars / baggage which means going through our API (or well our SDK implementations directly maybe, but not MetricProducer which is too late to add cross-cutting concern features to the micrometer metrics).

It is very hard to do that since their API is not well designed for that (or was not designed with that in mind).

FYI: We don't have enough tracing APIs let's create another one :)))

mateuszrzeszutek · 2022-01-10T16:31:05Z

Micrometer instruments like Timer and DistributionSummary can be translated to native OTel histograms; micrometer does not have a "histogram" meter by itself, it basically exports each bucket as a separate gauge.

It is just a representation difference, it is still a Histogram, I don't see the point.

Well that's true, conceptually it's the same. It results in different MetricData being exported though.

As such, I'd like Micrometer usage to benefit from features like exemplars / baggage which means going through our API (or well our SDK implementations directly maybe, but not MetricProducer which is too late to add cross-cutting concern features to the micrometer metrics).

It is very hard to do that since their API is not well designed for that (or was not designed with that in mind).

Doesn't that sort of work by default? If you're using a Counter and call counter.add(1) it'll capture Context.current() even if you don't pass the context; so if you use a bridged micrometer synchronous instrument it should get the exemplar from the current context even though micrometer API does not explicitly support this.

jmacd · 2022-01-25T00:57:03Z

I would like to re-phrase this feature request: Can we allow deleting or unregistering Callbacks? For discussion in the Tuesday Jan 25 8AM PT Spec SIG.

jmacd · 2022-01-25T01:08:55Z

There is a connected topic discussed in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#asynchronous-example-attribute-removal-in-a-view

The question is how, if at all, should an SDK know to stop reporting a metric?

anuraaga · 2022-01-28T03:41:17Z

Just wanted to add an example that it's fairly easy to introduce memory leaks into apps without the ability to unregister callbacks. For example if there is a connection pool that is being decomissioned, possibly due to a database resharding, then it should be able to be garbage collected. But with any natural registration (no complication of weak references) of an async instrumentation, without the ability to remove the callback from OpenTelemetry this pool can never be collected.

jsuereth · 2022-01-29T00:56:53Z

To back up @anuraaga's comment, while it's not as popular, old-school Java EE servers (like tomcat) or anything that uses hot / dynamic loading of plugins need the ability to clean up their memory and usage so Java can evict their bytecode and RAM.

For @bogdandrutu I want to directly address the notion that Micrometer should use MetricProducer as is done for OpenCensus.

The only reason we use MetricProducer bridge for OpenCensus is because the APi is so divergent from OpenTelemetry that we are unable to do a direct API bridge. If we could have done direct API, we'd have preferred that one. However because OpenCensus metric model is close enough to OTel's, we're not loosing too much with the MetricProducer bridge.

I think @mateuszrzeszutek raises some great points around Metric bridges, and I want to call out a few of them:

We should prefer adapting existing metric instrumentation rather than trying to replace it. Micrometer is a decent API. We're not in competition, and we can be complementary. We have no reason to force users to re-instrument if they're happy.
When we interface with existing Metric APIs we should enhance. There needs to be a reason users want OTEL, and in my opinion we (will) bring a few things to the table:

Resource-based telemetry correlation
Out of the box exemplars
Exponential Histograms (or just histograms for those still doing gauges of percentiles / summary)

An API <-> API bridge provides the easiest path forward for instrumentation authors to accomplish that goal.

bogdandrutu · 2022-02-03T00:15:56Z

@jsuereth Nice to hear all these great things, but I want to also ensure that we are not adding unnecessary complexity to our APIs to support this, hence the "very divergent" part that you mention. The proposal says if this API has a bad thing we should support it because we prefer this type of integration.

We should prefer adapting existing metric instrumentation rather than trying to replace it. Micrometer is a decent API. We're not in competition, and we can be complementary. We have no reason to force users to re-instrument if they're happy.

A "MetricProducer bridge" will give that to the micrometer users.

When we interface with existing Metric APIs we should enhance. There needs to be a reason users want OTEL, and in my opinion we (will) bring a few things to the table:

Resource-based telemetry correlation

A "MetricProducer bridge" will give that to the micrometer users. Resource can still be attached to the bridge instance.

Out of the box exemplars
Exponential Histograms (or just histograms for those still doing gauges of percentiles / summary)

Not 100% convinced that we cannot offer this, but I think we can with a bit of work.

@jsuereth it is very important to distinguish between a "instrumentation user friendly API" which is design to be used in the application instrumentation vs an "sink" API that you are doing when wrapping this API. I don't want us to become a "sink" API, if that is the goal then let's have a sink API design and not try to change the current API designed for users to instrument application to fit that goal.

Another argument is that micrometer is not an API :) sorry but it has way too much in that artifact to be considered an API.

bogdandrutu · 2022-02-03T00:20:57Z

@jsuereth also I do agree that they are good reasons to support unregister/close/etc for an instrument, but I don't think the reason should be that another API has it.

bogdandrutu · 2022-02-16T16:30:38Z

@jsuereth @mateuszrzeszutek is #2317 resolving this? I don't have that feeling...

jmacd · 2022-02-16T18:10:46Z

To clarify my intention, I believe #2317 narrowly solves this problem stating the SDK SHOULD give applications a way to create callbacks that support being unregistered, which allows a user to stop certain asynchronous measurements from reporting without shutting down an entire MeterProvider.

The Prometheus ecosystem uses a Delete() verb to signal an intention to erase a metric from an exporter. None of the verbs Stop(), Close() or Destroy() signal the same intention to me, and in the presence of duplicate instrument registration support (i.e., #2317) I would expect stop/close/destroy to effect the single instance they were called on. I would not expect one instance's stop/close/destroy to effect other instances, and I would not expect any of these to delete the streams reported from a specific instrument instance. I would not want instrument "instances" to matter to the SDK semantically speaking.

As the description states, users can simply stop using synchronous instruments if they want to stop reporting measurements; this doesn't seem like a problem we need to help the user with, but if we did, I would suggest that stop/close/destroy simply invalidates the instrument instance so that it can no longer be used to capture measurements.

I suggest we file a separate issue or issues to discuss the following points:

Need for Delete() verb?

The Delete() verb for instruments, and whether it has a meaningful definition in an API that supports duplicate instrument registration. In Prometheus, Delete() is used with specific instruments and labels, i.e. it doesn't delete an entire Metric, it deletes a single stream. I think of this "deletion" as an exporter preference, in any case. Prometheus expects that exporters remember every current/cumulative value they've ever exported, except those for which Delete() was used.

Need for exporter memory preferences?

In a push-based exporter, whether using cumulative or delta temporality, there's a question of whether to push a value that has not changed. Since we expect cumulative data to be pushed into a Prometheus ecosystem, it seems wise to default to pushing all values, even unchanged ones.

In an exporter with a preference for deltas, it's typical to avoid reporting any delta where the delta (sum) or count (histogram) are zero. For synchronous instruments, this is something the SDK can facilitate by detecting stale streams and, after a while, forgetting them. SHOULD the SDK be required to support forgetting streams? For asynchronous instruments, if the user simply stops reporting values they will stop reporting, unless we require the SDK to detect staleness. Should the Prometheus exporter be responsible for this on its own, or should the SDK facilitate an exporter preference for "synchronous instrument memory"? This topic is connected with #2132 because if the SDK is required to detect staleness, then no additional memory is required to perform cumulative-to-delta translation.

@mateuszrzeszutek please signal whether simply the option to unregister callbacks is enough, coupled with memory options for exporters? Or are you looking for something like Prometheus' Delete()?

mateuszrzeszutek · 2022-03-11T17:14:17Z

the option to unregister callbacks is enough, coupled with memory options for exporters? Or are you looking for something like Prometheus' Delete()?

I believe that should be enough. Removing all callbacks of an async instrument effectively functions like Delete(), and if the sync instruments' metric streams were to be forgotten after the SDK had determined them to be stale it'd fulfill my requirements for this story. (And, as you pointed out, it's pretty unclear how Delete() should work for synchronous instruments anyway).

Shall I close this issue? We can open another one just for the exporter memory feature.

marcalff · 2022-05-23T07:26:16Z

Greetings,

When code can be loaded and unloaded in a dynamic plugin (for example, dlopen() in C/C++), the application needs a way to unregister asynchronous instruments on unload, to avoid having the SDK call stale callbacks.

Without this, asynchronous metrics seem impossible to use with dynamic load / unload.

Regards.

jack-berg · 2022-05-23T15:31:03Z

Each asynchronous instrument already requires support for an unregister() method. See asynchronous counter, for example.

marcalff · 2022-05-23T15:51:47Z

Thanks @jack-berg for the clarifications.

The spec wording is (emphasis mine):

For callback functions registered after an asynchronous instrument is created, the API is required to support a mechanism for unregistration.

In my understanding:

an API exposing two operations, one to create an asynchronous instrument, and one to register a callback, is required to provide unregistration for the callback, ok.
what if the API creates the asynchronous instrument with a callback in a single operation ? There is no requirement to unregister the callback here.

Currently the opentelemetry-cpp API does the later.

Regards

marcalff · 2023-06-08T16:15:34Z

@open-telemetry/specs-approvers

Some update on this issue, about metrics.

opentelemetry-cpp now provide a way to remove a callback for an async instrument.

While this resolves the immediate concern, crashes that happened previously when a measurement is done invoking a defunct callback function (located in a shared library that was unloaded), there are remaining issues.

First, the Meter still has definition for instruments, sync and async, which remain while not measured.
Second, the meter provider still has references to meters that remain while not producing any measurements at all.

The problem here is that exporters still export data for these defunct meters / metrics, as tested with the OTLP HTTP exporter.

(Edit 2023-06-12: To analyse in detail)

It seems the opentelemetry specification lacks the following:

in the API/SDK, an operation to remove a synchronous / asynchronous instrument from a Meter
in the API/SDK, an operation to remove a Meter from a MeterProvider

A spec is needed for languages (like opentelemetry-cpp) to implement it.

This is a major concern, as there is no way in practice to use Metrics for OpenTelemetry in an environment where libraries are loaded and unloaded dynamically, when these libraries provide their own instrumentation.

alxbl · 2023-10-11T20:03:43Z

I'd like to add to this issue in favor of having a cleanup mechanism in the spec:

We perform monitoring of IoT devices which can be added/removed from our systems at any time by operators and would also benefit from being able to stop reporting metrics when those IoT devices are being decommissioned.

Currently, the .NET implementation of asynchronous counter/gauges supports removing metric points from a specific metric, but this causes the last value emitted for that metric point to be re-exported ad-infinitum (or just become a stale entry in the metric point list when using delta export mode) by the OTEL SDK implementation.

Another thing that would be interesting to specify is stale markers when a time series is known to be "finished".

e.g. Prometheus Remote Write also defines stale markers that can tell Prometheus that a time series is known to be stale and immediately marks it as stale so that it no longer shows up. I think it would be nice to have a specified way to do something similar via OTLP.

I would be willing to spend some time working on this if there is interest.

Thanks for the hard work on OTEL, by the way!

EDIT: I had the wrong link to stale markers in Prometheus

At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603

At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes #4603

mateuszrzeszutek added the spec:metrics Related to the specification/metrics directory label Dec 20, 2021

github-actions bot assigned tigrannajaryan Dec 20, 2021

This was referenced Dec 20, 2021

Allow to unregister/stop/destroy metric instruments open-telemetry/opentelemetry-java#4013

Closed

Micrometer bridge instrumentation open-telemetry/opentelemetry-java-instrumentation#4919

Merged

tigrannajaryan assigned jsuereth and unassigned tigrannajaryan Dec 20, 2021

bogdandrutu mentioned this issue Jan 3, 2022

Configuring histograms in the API (hints?) #2229

Closed

jmacd added this to the Metrics API/SDK Stable Release milestone Jan 25, 2022

jmacd added this to To do in Spec - Metrics API/SDK Stable Release Jan 25, 2022

jack-berg mentioned this issue Jan 28, 2022

Meter and instrument identity #2226

Closed

jack-berg mentioned this issue Feb 1, 2022

Make multi-instrument idioms for async callbacks possible #2281

Closed

jack-berg mentioned this issue Feb 3, 2022

Allow multiple async callbacks, allow callbacks to be removed open-telemetry/opentelemetry-java#4143

Merged

jmacd mentioned this issue Feb 4, 2022

Instrument identity, registration conflicts, multi-callback registration, unregister support #2317

Merged

aabmass mentioned this issue Feb 9, 2022

Support multi-instrument observable callbacks and ability to unregister callbacks open-telemetry/opentelemetry-python#2454

Open

jmacd mentioned this issue Feb 16, 2022

Clarify preferred aggregation temporality; give Views a selection strategy #2314

Closed

This was referenced Mar 1, 2022

Clarification on MetricReader registered on MeterProvider #2072

Closed

Move metrics SDK spec to Mixed #2304

Merged

reyang removed this from To do in Spec - Metrics API/SDK Stable Release Mar 1, 2022

reyang modified the milestones: Metrics API/SDK Stable Release, Metrics Future Release Mar 1, 2022

srikanthccv mentioned this issue Apr 6, 2022

opentelemetry-instrumentation-system-metrics restore package open-telemetry/opentelemetry-python-contrib#1012

Merged

6 tasks

marcalff mentioned this issue Jun 8, 2023

[API] Remove Meters open-telemetry/opentelemetry-cpp#2184

Closed

jack-berg mentioned this issue Dec 13, 2023

OpenTelemetry does not support resetting in-memory Metric to clean the memory usage #3788

Closed

pellared mentioned this issue Dec 14, 2023

OpenTelemetry does not support resetting in-memory Metric to clean the memory usage open-telemetry/opentelemetry-go#4752

Open

MrAlias mentioned this issue Apr 4, 2024

Add ability to reset metrics with Otel #3985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to unregister/stop/destroy instruments #2232

Allow to unregister/stop/destroy instruments #2232

mateuszrzeszutek commented Dec 20, 2021

tigrannajaryan commented Dec 20, 2021

bogdandrutu commented Jan 3, 2022

anuraaga commented Jan 4, 2022

mateuszrzeszutek commented Jan 4, 2022

anuraaga commented Jan 4, 2022 •

edited

bogdandrutu commented Jan 5, 2022

mateuszrzeszutek commented Jan 10, 2022

jmacd commented Jan 25, 2022

jmacd commented Jan 25, 2022

anuraaga commented Jan 28, 2022

jsuereth commented Jan 29, 2022

bogdandrutu commented Feb 3, 2022

bogdandrutu commented Feb 3, 2022

bogdandrutu commented Feb 16, 2022

jmacd commented Feb 16, 2022

mateuszrzeszutek commented Mar 11, 2022

marcalff commented May 23, 2022 •

edited

jack-berg commented May 23, 2022

marcalff commented May 23, 2022

marcalff commented Jun 8, 2023 •

edited

alxbl commented Oct 11, 2023 •

edited

Allow to unregister/stop/destroy instruments #2232

Allow to unregister/stop/destroy instruments #2232

Comments

mateuszrzeszutek commented Dec 20, 2021

tigrannajaryan commented Dec 20, 2021

bogdandrutu commented Jan 3, 2022

anuraaga commented Jan 4, 2022

mateuszrzeszutek commented Jan 4, 2022

anuraaga commented Jan 4, 2022 • edited

bogdandrutu commented Jan 5, 2022

mateuszrzeszutek commented Jan 10, 2022

jmacd commented Jan 25, 2022

jmacd commented Jan 25, 2022

anuraaga commented Jan 28, 2022

jsuereth commented Jan 29, 2022

bogdandrutu commented Feb 3, 2022

bogdandrutu commented Feb 3, 2022

bogdandrutu commented Feb 16, 2022

jmacd commented Feb 16, 2022

Need for Delete() verb?

Need for exporter memory preferences?

mateuszrzeszutek commented Mar 11, 2022

marcalff commented May 23, 2022 • edited

jack-berg commented May 23, 2022

marcalff commented May 23, 2022

marcalff commented Jun 8, 2023 • edited

alxbl commented Oct 11, 2023 • edited

anuraaga commented Jan 4, 2022 •

edited

marcalff commented May 23, 2022 •

edited

marcalff commented Jun 8, 2023 •

edited

alxbl commented Oct 11, 2023 •

edited