Add synchronous gauge instrument, clarify temporality selection influence on metric point persistence #3540

jack-berg · 2023-06-05T21:52:58Z

Resolves #2318.

Extends the metrics API to include a new synchronous gauge instrument, whose value can be set as measurements occur. This is useful when the measurements are made accessible via subscription, such as JFR events.

There are interesting questions around the behavior of start_time_unix_nano and whether or not sync gauges continue to export series without new measurements. This PR has gone through some iteration (available to see via commit history), but as it currently stands, I've made the following changes:

Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the persistence of metric points across successive collections, as described in this Add synchronous gauge instrument, clarify temporality selection influence on metric point persistence #3540 (comment). This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge persistence behavior is influenced by aggregation temporality selection.
Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the starting timestamp of metric points. When an instrument is cumulative, its starting timestamp will always be the same. When an instrument is delta, its starting timestamp advances with successive collections. This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge starting timestamp is influenced by aggregation temporality selection.

specification/metrics/api.md

MrAlias

What is the plan for updating the SDK specification? It would probably not be a good idea to release this change without an SDK specification. Are we planning to block the release until we have one?

specification/metrics/api.md

jmacd · 2023-06-12T20:26:31Z

letting gauge export semantics be influenced by a temporality selection seems incorrect.

Mentioned here,
https://github.com/lightstep/otel-launcher-go/blob/main/lightstep/sdk/metric/README.md#synchronous-gauge-instrument

The Lightstep Metrics SDK has support for an off-spec synchronous gauge achieved by using another synchronous instrument (e.g., histogram) with Gauge as the aggregation (via a "hint"). It defines two behaviors matching the "classical" behavior of a typical Statsd (delta) or Prometheus (cumulative) synchronous gauge, where the gauge value has to be set to be reported vs a gauge value that remains set indefinitely.

This raises the question of whether a Meter can ever mix temporality for synchronous instruments, and how that would be configured. Presently, we have no way to individually customize temporality of a View clause, but gauges seem to be useful both ways--possibly a hint is enough.

jack-berg · 2023-06-15T21:16:06Z

What is the plan for updating the SDK specification? It would probably not be a good idea to release this change without an SDK specification. Are we planning to block the release until we have one?

I didn't see anything else in the SDK that needs to be updated to accommodate this API addition. Do you have anything in mind?

…y-specification into sync-gauge

jack-berg · 2023-06-15T21:40:22Z

The Lightstep Metrics SDK has support for an off-spec synchronous gauge achieved by using another synchronous instrument (e.g., histogram) with Gauge as the aggregation (via a "hint"). It defines two behaviors matching the "classical" behavior of a typical Statsd (delta) or Prometheus (cumulative) synchronous gauge, where the gauge value has to be set to be reported vs a gauge value that remains set indefinitely.

This behavior is most aligned with existing systems, but current language in the metrics data model and proto description suggest otherwise.

The metrics data model says:

A timestamp when the value was sampled (time_unix_nano)

I've pushed a commit to adjust this to match the timestamp language for the other point types, which simply says the time window (start, end] reflects the interval for which the gauge was calculated. This suggests that if temporality is delta, the start should be the last report time and only series with new measurements are exported. If temporality is cumulative, start should be start of application or time of first measurement, and series should be exported even if no new measurements are received.

The proto gauge description says:

A Gauge does not support different aggregation temporalities. Given the aggregation is unknown, points cannot be combined using the same aggregation, regardless of aggregation temporalities. Therefore, AggregationTemporality is not included. Consequently, this also means “StartTimeUnixNano” is ignored for all data points.

I've opened a draft PR which would change the language around temporality selection. The general aim is to make it more correct to let aggregation temporality influence gauges and the StartTimeUnixNano field.

I think the changes needed to the data model and proto needed to follow this interpretation should be allowed (i.e. they're NOT breaking).

github-actions · 2023-06-23T03:21:20Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

specification/metrics/api.md

specification/metrics/data-model.md

quentinmit · 2023-06-23T22:38:49Z

The Lightstep Metrics SDK has support for an off-spec synchronous gauge achieved by using another synchronous instrument (e.g., histogram) with Gauge as the aggregation (via a "hint"). It defines two behaviors matching the "classical" behavior of a typical Statsd (delta) or Prometheus (cumulative) synchronous gauge, where the gauge value has to be set to be reported vs a gauge value that remains set indefinitely.

This behavior is most aligned with existing systems, but current language in the metrics data model and proto description suggest otherwise.

I'm not sure what @jmacd is getting at with "delta" and "cumulative" here. It's not meaningful to calculate the delta between two gauge points, which I think is why the original language talked talked about not supporting different aggregation temporalities. It sounds like maybe you're using "delta" to refer to "only sends a point when the value has changed" and "cumulative" to refer to "send a point every interval regardless of whether it has changed". AIUI both statsd (spec) and Prometheus will continue reporting the last value of a gauge until a new value is set or the gauge is explicitly cleared.

It's possible someone might want to not report gauge points where the value hasn't changed, but that's not the normal behavior of statsd or Prometheus and it is difficult to consume such a time series downstream. If you want to provide this option (and I don't think it's really worth it) it should be a separate setting from whether counters are reported as delta or cumulative points.

The metrics data model says:

A timestamp when the value was sampled (time_unix_nano)

I've pushed a commit to adjust this to match the timestamp language for the other point types, which simply says the time window (start, end] reflects the interval for which the gauge was calculated. This suggests that if temporality is delta, the start should be the last report time and only series with new measurements are exported. If temporality is cumulative, start should be start of application or time of first measurement, and series should be exported even if no new measurements are received.

What does "gauge was calculated" mean here? In general gauges are sampled, not calculated, so it doesn't make sense to talk about an interval for which they were calculated. Only a small subset of gauges have a window over which they are calculated (e.g. a gauge representing the mean/max of a value over a time window). Even if you include "last" as a possible calculation, that is only meaningful in an SDK context where you know you are aggregating the samples.

The proto gauge description says:

A Gauge does not support different aggregation temporalities. Given the aggregation is unknown, points cannot be combined using the same aggregation, regardless of aggregation temporalities. Therefore, AggregationTemporality is not included. Consequently, this also means “StartTimeUnixNano” is ignored for all data points.

I've opened a draft PR which would change the language around temporality selection. The general aim is to make it more correct to let aggregation temporality influence gauges and the StartTimeUnixNano field.

I think the changes needed to the data model and proto needed to follow this interpretation should be allowed (i.e. they're NOT breaking).

I disagree; there are plenty of gauge points in the wild that (meaningfully) do not have start times, and this data model spec change would make those points invalid.

specification/metrics/api.md

specification/metrics/data-model.md

jack-berg · 2023-06-27T19:06:25Z

Even if you include "last" as a possible calculation, that is only meaningful in an SDK context where you know you are aggregating the samples.

Taking the last is what is being referred to by "calculation". The SDK behavior is big factor of the data model. I don't think this language negatively impacts the interpretation of the data model in other contexts.

I disagree; there are plenty of gauge points in the wild that (meaningfully) do not have start times, and this data model spec change would make those points invalid.

Not quite. As mentioned here StartTimeUnixNano is optional for all point types. This PR just aims to carve out the ability for the aggregation temporality selection to influence the value StartTimeUnixNano (either time of last collection or SDK start time) of gauges produced by synchronous gauge instruments.

reyang · 2023-08-10T18:38:33Z

Seeking new reviews from @reyang and @jsuereth since I made change since the original approval. See this comment for a summary.

@jsuereth would you take a look at the updated PR?

…y-specification into sync-gauge

jack-berg · 2023-08-15T22:03:46Z

Will merge this tomorrow morning if there are no additional comments.

Related to #3893. The table has entries for all the instrument types but I forgot to add an entry for synchronous gauge in #3540. Fixing.

Resolve #5225 The specification has [added a synchronous gauge instrument](open-telemetry/opentelemetry-specification#3540). That instrument has now been [stabilized](open-telemetry/opentelemetry-specification#4019), and that stabilization is included in the [next release](open-telemetry/opentelemetry-specification#4034). This adds the new synchronous gauge instrument to the metric API and all implementation we publish. This change will be a breaking change for any SDK developer. The `embedded` package is updated to ensure our compatibility guarantees are meet. --------- Co-authored-by: David Ashpole <dashpole@google.com>

Add synchronous gauge instrument

e0d6a9b

jack-berg requested review from a team as code owners June 5, 2023 21:52

github-actions bot assigned jmacd Jun 5, 2023

reyang reviewed Jun 5, 2023

View reviewed changes