Support for synchronous Gauge and an UpDownCounter that can be Set() #2318

jmacd · 2022-02-04T20:51:02Z

What are you trying to achieve?

There are long-standing requests to support a settable Gauge, as this is an API both Prometheus and Statsd support.

In the OpenTelemetry data model, the same request happens for an UpDownCounter that can be set. Whereas an asynchronous UpDownCounter can be set from a callback, users are asking for an API to set a Gauge or cumulative UpDownCounter from synchronous code (and, for that matter, a cumulative Counter)

The OpenTelemetry data model says exactly what kind of point should be produced for these events (Gauges and cumulative Sums), so we know what the API should be and what the data should look like, but still there's been an objection. Let's focus on Gauge, since it has a longer history.

You can argue that by making the OTel Gauge instrument only support asynchronous use from a callback, we are improving the efficiency of a monitoring solution by preventing the user from making extraneous observations.

The counter-argument is that you're making the user's life more difficult, particularly in situations where they are holding a value and do not want to defer it until a callback. See, e.g., #2054.

One of the objections to having a synchronous Gauge is that depending on the collection interval, there may or may not be gaps in the series. However, the data model discusses gap handling, and the resulting series with gaps is well-defined. The result, some will argue, is working as intended.

The big distinction in terms of implementation between the synchronous and asynchronous instruments inside the SDK is whether an aggregator will see many updates per interval or is expected to see one update per interval. Our specification is not very strong on what should happen when multiple observations are made for a single instrument and attributes during a collection interval. The result is currently unspecified.

Proposal

This proposal would be easy to pull off in OTel-Go because we are leaning toward using similar APIs for synchronous and asynchronous instruments, i.e., in OTel-Go an asynchronous instrument is just like a synchronous instrument that you only call from a registered callback.

To add the requested APIs:

Allow registering "background asynchronous" Counter, UpDownCounter, and Gauge instruments. These have the same API as the asynchronous instruments, i.e., Observe(value) for setting cumulative and current values but would be used outside of registered callbacks
Enforce that at most 1 value is observed per collection per aggregator. We have to prevent the natural merge function from happening if more than one observation is received between collections, otherwise this does not provide the requested functionality. This would resolve the unspecified behavior of the current specification for duplicate observations in favor of a strict last-value-wins policy.

The result would be as expected. If the user does not update a synchronous gauge during a collection interval, missing data (i.e., a gap). If the user updates once or more between collection intervals, the last value will be observed.

Overall, this seems like a small extension to the current specification, does not create any new instruments or APIs or data model, and gives users what they've been asking for.

The text was updated successfully, but these errors were encountered:

emilgelman · 2022-02-28T14:41:27Z

Hey @jmacd,

I appreciate your effort on this matter. I was wondering if there was any progress made regarding this issue?

I'll try to provide some additional context. We are using the OpenTelemetry Go SDK to instrument our services.
In one of our services, we would like to measure the duration of a specific operation. We have the value during the execution of a specific synchronous function - so we can't use an async instrument, and we do not wish to use traces to achieve the same behavior as our backend has different billing and functionality provided for traces.

Our current workaround is to use a custom in memory map (to achieve last value aggregation) and an asynchronous gauge to report those map values, and then flush it every collect interval. Having to implement a custom data structure just to support a simple metric type (which exists in practically any other metric stack) is both cumbersome and not intuitive at all.

I think that the argument of not implementing this as it improves the efficiency of monitoring is somewhat inconsistent with the whole approach of OpenTelemetry. Is there anything else that can be done to make this happen?

Aaronontheweb · 2022-04-06T17:42:14Z

Related since I asked about this in the dotnet implementation repo open-telemetry/opentelemetry-dotnet#2362 (comment)

Our use case is that we have millions of actors whose mailbox depth we would like to periodically report on - but only when the actor is actually processing messages. Having an observable gauge for each of those would be non-performant for our use cases - right now we're trying to use histograms and it's a bit of a hack.

jmacd · 2022-04-06T18:27:51Z

From CNCF Slack we have a use-case described:

if the CRD is present then it is denoted by 1 as the metric value and if it's not (if it's deleted) then it is denoted by 0 ...so everytime I want to set the metric value to 0 (and observe it) whenever I delete the CRD.

In this case, the application as built does not keep persistent state, it just observes changes to the current value of a cumulative sum or gauge quantity.

bogdandrutu · 2022-04-06T19:50:42Z

@Aaronontheweb why the current "UpDownCounter" is not good? Curios what exactly is the metric about the actors that you want to report.

bogdandrutu · 2022-04-06T19:53:26Z

if the CRD is present then it is denoted by 1 as the metric value and if it's not (if it's deleted) then it is denoted by 0 ...so everytime I want to set the metric value to 0 (and observe it) whenever I delete the CRD.

@jmacd now that I see better this example, can that be an UpDownCounter where +1 on start and -1 on end? I know it is a bit of an abuse of the API, but want to understand that if this is good as a short term until we get more proof that we need sync Gauge.

Aaronontheweb · 2022-04-06T20:28:06Z

@Aaronontheweb why the current "UpDownCounter" is not good? Curios what exactly is the metric about the actors that you want to report.

Because I'm measuring the point-in-time size of the queue depth at any given time, rather than its rate of change - I can measure the rate at which actors process messages with a SumCounter and I can measure per-message processing latency with a histogram, but I really need a gauge to measure the backlog of the queue - and rather than create an ObservableGauge for hundreds of thousands / millions of small queues it'd be better to share a single Gauge as a bounded metric that is differentiated by its attributes (which are low cardinality) - same as we do for the counter and histogram types.

bogdandrutu · 2022-04-06T23:21:50Z

@Aaronontheweb "UpDownCounter" is not for "rate". The "UpDownCounter" is as a name says a counter (sum) that supports positive and negative values, so your backlog will do "+1" when element is added and "-1" when is removed. And you get the right value.

Aaronontheweb · 2022-04-06T23:58:26Z

I can't track when a value is added to it - I can only check it when the actor is running.

tomasfreund · 2022-04-07T11:11:12Z

I have a similar issue - when processing messages from e.g. Azure EventHub, I have the offset and enqueued time of all messages in the batch and I get the last offset and last enqueued time from EventHub every time I fetch a new batch.
Ideally, I'd use this to calculate consumer lag (both in messages and seconds) and record the value in a gauge.
I can't use an UpDownCounter because I don't want to add to the value, I need to set the value, it comes from an external system.
I also don't want to use an asynchronous gauge because that would require me to write additional code just to handle fetching the value on demand and I would also need to do extra I/O every time metrics are collected.

There are many other cases when you might need a synchronous gauge. An example might be an IoT device where some module periodically reports some values to the main control unit. This main control unit would like to record these values somewhere the moment it gets them. At this point there is no way to record gauge metrics this way. Sure, you could store them in local variables and then use an async gauge to collect the values on metrics collection but it seems very convoluted.
Of course an async gauge is useful for many things, it's just that I don't understand why it would be a good idea to only have an async version.

Aaronontheweb · 2022-06-15T16:03:19Z

Any updates on this as a consideration?

rlinehan · 2022-06-22T17:34:58Z

Would really, really like to see this supported.

Our use case: we run a set of tests against our service via an AWS Lambda every minute. We want to emit a gauge indicating whether the tests succeeded or not (basically a binary "1" or "0"). The code is much more complicated with an async gauge than a synchronous gauge since we just want to issue it once and right at the end of the test run, and because we're in a Lambda and need to make sure it emits while the Lambda is running.

jmacd · 2022-06-28T14:40:02Z

This is a top priority (IMO) and I will begin working on the (very straightforward) proposal in the coming few weeks.

jmacd · 2022-07-08T20:18:51Z

^^^ Here is the prototype I am working with,.

ty-elastic · 2022-07-19T15:21:09Z

Hi, I'd love to see support for synchronous Gauges as well! The lack of sync gauges can make it difficult to interface with existing libraries which push metrics to your app at specific intervals which they define. In this example (https://github.com/ty-elastic/videojs_observability), I am pushed metrics which I then copy into a global map. From there, I have to coax otel into turning around and reading/reporting those metrics (some as gauges) asynchronously. At best, the timing is wrong (the data is timestamped at the time of async reading rather than then time when they are pushed to me). A better approach would allow one to write gauge values synchronously with a proper timestamp; the last value is then reported when the exporter gathers metrics.

jack-berg · 2022-07-19T19:24:22Z

A better approach would allow one to write gauge values synchronously with a proper timestamp

I've been assuming that the timestamps reported for synchronous gauges would be the time of collection, not the time the instrument was most recently "set". A deviation from this would be inconsistent with other instruments / aggregations.

ty-elastic · 2022-07-19T22:03:36Z

I've been assuming that the timestamps reported for synchronous gauges would be the time of collection, not the time the instrument was most recently "set". A deviation from this would be inconsistent with other instruments / aggregations.

interesting.. that is a reasonable assumption for sync counters, but for sync gauges, it seems like ideally you would assert the timestamp and value together to ensure that, from an observability perspective, you can say "oh - this gauge was at this water mark at this specific time" rather than "this gauge was at this water mark sometime in between this collection interval". for most apps, it probably matters not. for "real-time" apps (e.g., a video compression engine), that difference could be confusing.

zbindenren · 2022-12-09T13:26:47Z

Any news on this? I think a lot of people need this.

gaeljw · 2022-12-13T17:59:29Z

Just a quick comment to describe a use-case of mines: we have an application that runs a scheduled task to count inconsistencies in our data. This computation might be long enough to not want to do it when the instrument is observed but only do it on a schedule that the application itself defines and is responsible of.

We used to use Prometheus gauges.

With OpenTelemetry, this is not possible with gauges. Technically it would be possible to use a counter but from a "functional" point of view, we are not computing delta values, we are computing a "total value" at a scheduled rate.

Edit: I guess that the thing that makes more sense in my case is to store the value on my end somehow (in memory) and read it async via the async gauge.

nickb937 · 2023-01-31T15:09:14Z

I have a few applications that do data processing and I have a desire to expose some of the internal variables as metrics around the internal health of the system. The metrics aren't time-series as such, they are samples of a value of a variable at a point-in-time that I desire to emit to the monitoring system for observation. For example, several batches of CSV files come in each day and I may want to observe the number of lines read from those files.

The Gauge datatype looks like the correct type for publishing internal metrics because each sample is distinct from the previous sample.

However, there's no way to publish a Gauge value directly (synchronously), like this:

file_lines = open("/tmp/datafile.csv", "r").readlines()
metric_lines = meter.create_gauge("line-count")
metric_lines.set_value(len(file_lines))

It's unclear how I would go about emitting the value of a variable through an ObservableGauge without first publishing the variable into a global variable for sampling with the callback, which isn't a realistic solution:

LINE_COUNT = 0

def observable_gauge_func(options):
    global LINE_COUNT
    yield Observation(LINE_COUNT, {})

file_lines = open("/tmp/datafile.csv", "r").readlines()
meter.create_observable_gauge("line-count", [observable_gauge_func])
LINE_COUNT = len(file_lines)

Woodrow1026 · 2023-02-10T08:49:00Z

We also need such functions very much,

In our scenario, there are a large number of test cases to be executed, but due to performance reasons, they are only run once a day (configurable). We use the histogram to get whether each case is successful (0, 1 binary)

But now, there are some use cases that need to be returned. We want to care about their return values. At this time, histogram is no longer appropriate, because some values in the bucket will be lost. Asynchronous dashboards are also not suitable for us because we execute use cases through scheduling. The operation of UpDownCounter is strange.

benpoliquin · 2023-02-22T20:04:55Z

+1 on the need for this synchronous gauge type.

ascrookes · 2023-03-01T22:18:06Z

I could also benefit from a synchronous gauge type. Like others in this thread, I want to use it to measure queue depth and unique users in the queue across time. I'm using the Python library.

Do let me know if there is another metrics instrument that could be used for this but I didn't see one that would be a better fit.

In the meantime, I have a simple Python class that wraps the asynchronous gauge so I can set point in time values without using the callback logic in my application code. It only reports out on the interval you have set on the metrics exported. That means if I set a value, and then set another value before the initial value has been exported, it'll get dropped. There is probably a way to fix that but for my use case that trade-off is fine.

from typing import Iterable

from opentelemetry import metrics

# Alias to make it easier to access the Observations class.
# This takes in a single value (float or int) and an optional dictionary
# of attributes.
# https://opentelemetry-python.readthedocs.io/en/latest/api/metrics.html#opentelemetry.metrics.Observation
MetricsGaugeObservation = metrics.Observation


class MetricsGauge:
    def __init__(self, name: str, meter: metrics.Meter):
        self._name = name
        self._meter = meter
        self._internal_gauge = None
        self._observations = []

    def set_observations(self, observations: Iterable[MetricsGaugeObservation]):
        self._observations = observations
        # The async gauge starts reporting as soon as it is created. Create
        # the async gauge once a real value has been set to avoid creating noise
        # prior to values getting set.
        if self._internal_gauge is None:
            self._internal_gauge = self._meter.create_observable_gauge(
                name=self._name, callbacks=[self._gauge_callback]
            )

    def _gauge_callback(self, _options):
        return self._observations


# In application code

meter = get_meter(__name__)
gauge = MetricsGauge("gauge.name", meter)
# When the gauge value changes
new_value, new_attributes = func_to_get_gauge_value_and_attributes()

# This value takes an interable so that you can set multiple values and attributes
# for the same timestamp.
gauge.set_observations([MetricsGaugeObservation(new_value, new_attributes)])

GlassonGlassoff · 2023-03-03T21:52:09Z

@jmacd Did you ever get any traction on the proposal you mentioned in July?

The lack of sync gauges is causing a massive headache for us in a service where state isn't saved, so there isn't a great way to handle callbacks.

lsytj0413 · 2023-04-12T03:33:15Z

any update for this proposal?

triblondon · 2023-04-28T10:48:29Z

Here's my polyfill for JavaScript:

class OTelGauge {
  private currentValue: number;
  private gauge: Observable;
  constructor(meter: Meter, name: string, initValue?: number) {
    this.currentValue = initValue;
    this.gauge = meter.createObservableGauge(name);
    this.gauge.addCallback((result) => result.observe(this.currentValue));
  }
  observeValue(newValue: number) {
    this.currentValue = newValue;
  }
}

tommaybe · 2023-05-05T12:18:50Z

For queue tracking purposes this one is also really essential for us.

MatisseHack · 2023-05-13T01:57:56Z

Having a Set method on UpDownCounters would be really useful for my team! We maintain an event driven service and would like to track various metrics related to each event. At the moment we've resorted to exposing a LastEvent property on some of our classes that can be accessed by ObservableUpDownCounters, but this approach isn't nearly as elegant as using a Set method directly when handling the event.

jaelrod · 2023-05-25T18:24:20Z

Synchronous version of gauge would be helpful for our use case as well, as we have a need for emitting metrics with negative values by absolute value and not incrementation, and currently the only support for negative values is asynchronous gauge/up-down-counter or synchronous up-down-counter.

agunglotto · 2023-05-27T11:30:33Z

I've ported an application from Prometheus Client to OpenCensus. Both have synchroneous gauge implementation. So the switch wasn't that hard. Now, OpenCensus is no longer supported starting from July and recommends using OpenTelementry. And here is only an asynchroneus gauge.
It's really hard to understand why the developers decided "randomly" for sync/async model in OpenTelemetry. And then to see this request open for one year...

jack-berg · 2023-06-05T20:24:58Z

FYI, I've built a prototype java implementation of this here and am planning on opening a spec PR with a change proposal.

bogdandrutu · 2023-06-13T22:29:36Z

@jmacd very nice written argument for how to implement Gauge, but I feel that there are not enough examples where the async/observer pattern does not work. The link issue regarding async up down is very weak, unable to understand the argument why that needs to expose a state...

…ence on metric point persistence (#3540) Resolves #2318. Extends the metrics API to include a new synchronous gauge instrument, whose value can be set as measurements occur. This is useful when the measurements are made accessible via subscription, such as JFR events. There are interesting questions around the behavior of `start_time_unix_nano` and whether or not sync gauges continue to export series without new measurements. This PR has gone through some iteration (available to see via commit history), but as it currently stands, I've made the following changes: - Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the persistence of metric points across successive collections, as described in this #3540 (comment). This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge persistence behavior is influenced by aggregation temporality selection. - Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the starting timestamp of metric points. When an instrument is cumulative, its starting timestamp will always be the same. When an instrument is delta, its starting timestamp advances with successive collections. This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge starting timestamp is influenced by aggregation temporality selection. --------- Co-authored-by: Reiley Yang <reyang@microsoft.com>

agunglotto · 2023-12-23T09:33:18Z

There are commits for Python, C++ and .Net - do you plan to provide it for other languages / SDKs like Golang etc.?

freemansoft · 2024-02-21T01:57:55Z

My bad. Saw that this was picked up in #3540

amedveshchek · 2024-02-23T17:32:35Z

I found that the synchronous Gauge metric is already implemented for Golang, but in low-level code, which is still available for the usage -- they simply haven't implemented the convenience methods around that.

Here's the official code of telemetrygen tool, which submits the Gauge metrics by default: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/cmd/telemetrygen/internal/metrics/worker.go#L52-L64

So I used this code and worked it around for now. Will replace it with the original synchronous Gauge once it's implemented in the official lib for golang.

jmacd added the spec:metrics Related to the specification/metrics directory label Feb 4, 2022

github-actions bot assigned yurishkuro Feb 4, 2022

yurishkuro assigned jmacd and unassigned yurishkuro Feb 4, 2022

This was referenced Feb 4, 2022

Instrument identity, registration conflicts, multi-callback registration, unregister support #2317

Merged

Specify the treatment of duplicate instruments and observations #2270

Closed

Aaronontheweb mentioned this issue Apr 6, 2022

Support UpDownCounter open-telemetry/opentelemetry-dotnet#2362

Closed

jmacd mentioned this issue Jul 8, 2022

Document experimental synchronous Gauge support lightstep/otel-launcher-go#207

Merged

jmacd mentioned this issue Jul 25, 2022

Metrics SDK: define standard circuit breakers for uncollected, infinite cardinality recordings #1891

Closed

jmacd mentioned this issue Aug 4, 2022

StartTime in Gauge is hard to understand/inconsistent #2244

Open

trask mentioned this issue Sep 7, 2022

Request: Support for Synchronous Gauge #2771

Closed

craigpastro mentioned this issue Dec 13, 2022

feat: implement otlp exporter openfga/openfga#360

Merged

4 tasks

legendecas mentioned this issue Jan 17, 2023

[spec] Remove gauge, add observer open-telemetry/opentelemetry-js#739

Closed

ptrdom mentioned this issue Mar 8, 2023

Refactor ZIO OTEL metrics to use ConcurrentMetricRegistry ScalaConsultants/mesmer#634

Closed

MrAlias mentioned this issue Apr 11, 2023

How to implement Prometheus Gauge Set method in open-telemetry？ open-telemetry/opentelemetry-go#3984

Open

jack-berg mentioned this issue Jun 5, 2023

Add experimental synchronous gauge open-telemetry/opentelemetry-java#5506

Merged

jack-berg mentioned this issue Jun 5, 2023

Add synchronous gauge instrument, clarify temporality selection influence on metric point persistence #3540

Merged

nathanieltalbot mentioned this issue Jun 29, 2023

Move synchronous gauge to stability open-telemetry/opentelemetry-python#3363

Open

jack-berg closed this as completed in #3540 Aug 16, 2023

fgallegosalido mentioned this issue Aug 25, 2023

Add Synchronous Gauge instrument to the metrics API/SDK open-telemetry/opentelemetry-cpp#2279

Open

tommaybe mentioned this issue Aug 28, 2023

Synchronous Gauge support open-telemetry/opentelemetry-dotnet#4805

Open

jack-berg mentioned this issue Feb 21, 2024

Stabilize synchronous gauge instrument API #3893

Closed

11 tasks

diurnalist mentioned this issue Apr 30, 2024

OpenTelemetry Tracing API vs Tokio-Tracing API for Distributed Tracing open-telemetry/opentelemetry-rust#1571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for synchronous Gauge and an UpDownCounter that can be Set() #2318

Support for synchronous Gauge and an UpDownCounter that can be Set() #2318

jmacd commented Feb 4, 2022

emilgelman commented Feb 28, 2022

Aaronontheweb commented Apr 6, 2022

jmacd commented Apr 6, 2022

bogdandrutu commented Apr 6, 2022

bogdandrutu commented Apr 6, 2022

Aaronontheweb commented Apr 6, 2022

bogdandrutu commented Apr 6, 2022

Aaronontheweb commented Apr 6, 2022

tomasfreund commented Apr 7, 2022

Aaronontheweb commented Jun 15, 2022

rlinehan commented Jun 22, 2022

jmacd commented Jun 28, 2022

jmacd commented Jul 8, 2022

ty-elastic commented Jul 19, 2022

jack-berg commented Jul 19, 2022

ty-elastic commented Jul 19, 2022 •

edited

zbindenren commented Dec 9, 2022 •

edited

gaeljw commented Dec 13, 2022 •

edited

nickb937 commented Jan 31, 2023

Woodrow1026 commented Feb 10, 2023

benpoliquin commented Feb 22, 2023

ascrookes commented Mar 1, 2023 •

edited

GlassonGlassoff commented Mar 3, 2023

lsytj0413 commented Apr 12, 2023

triblondon commented Apr 28, 2023

tommaybe commented May 5, 2023

MatisseHack commented May 13, 2023

jaelrod commented May 25, 2023

agunglotto commented May 27, 2023

jack-berg commented Jun 5, 2023

bogdandrutu commented Jun 13, 2023 •

edited

agunglotto commented Dec 23, 2023

freemansoft commented Feb 21, 2024 •

edited

amedveshchek commented Feb 23, 2024 •

edited

Support for synchronous Gauge and an UpDownCounter that can be Set() #2318

Support for synchronous Gauge and an UpDownCounter that can be Set() #2318

Comments

jmacd commented Feb 4, 2022

emilgelman commented Feb 28, 2022

Aaronontheweb commented Apr 6, 2022

jmacd commented Apr 6, 2022

bogdandrutu commented Apr 6, 2022

bogdandrutu commented Apr 6, 2022

Aaronontheweb commented Apr 6, 2022

bogdandrutu commented Apr 6, 2022

Aaronontheweb commented Apr 6, 2022

tomasfreund commented Apr 7, 2022

Aaronontheweb commented Jun 15, 2022

rlinehan commented Jun 22, 2022

jmacd commented Jun 28, 2022

jmacd commented Jul 8, 2022

ty-elastic commented Jul 19, 2022

jack-berg commented Jul 19, 2022

ty-elastic commented Jul 19, 2022 • edited

zbindenren commented Dec 9, 2022 • edited

gaeljw commented Dec 13, 2022 • edited

nickb937 commented Jan 31, 2023

Woodrow1026 commented Feb 10, 2023

benpoliquin commented Feb 22, 2023

ascrookes commented Mar 1, 2023 • edited

GlassonGlassoff commented Mar 3, 2023

lsytj0413 commented Apr 12, 2023

triblondon commented Apr 28, 2023

tommaybe commented May 5, 2023

MatisseHack commented May 13, 2023

jaelrod commented May 25, 2023

agunglotto commented May 27, 2023

jack-berg commented Jun 5, 2023

bogdandrutu commented Jun 13, 2023 • edited

agunglotto commented Dec 23, 2023

freemansoft commented Feb 21, 2024 • edited

amedveshchek commented Feb 23, 2024 • edited

ty-elastic commented Jul 19, 2022 •

edited

zbindenren commented Dec 9, 2022 •

edited

gaeljw commented Dec 13, 2022 •

edited

ascrookes commented Mar 1, 2023 •

edited

bogdandrutu commented Jun 13, 2023 •

edited

freemansoft commented Feb 21, 2024 •

edited

amedveshchek commented Feb 23, 2024 •

edited