Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synchronous gauge instrument, clarify temporality selection influence on metric point persistence #3540

Merged
merged 16 commits into from
Aug 16, 2023

Conversation

jack-berg
Copy link
Member

@jack-berg jack-berg commented Jun 5, 2023

Resolves #2318.

Extends the metrics API to include a new synchronous gauge instrument, whose value can be set as measurements occur. This is useful when the measurements are made accessible via subscription, such as JFR events.

There are interesting questions around the behavior of start_time_unix_nano and whether or not sync gauges continue to export series without new measurements. This PR has gone through some iteration (available to see via commit history), but as it currently stands, I've made the following changes:

  • Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the persistence of metric points across successive collections, as described in this Add synchronous gauge instrument, clarify temporality selection influence on metric point persistence #3540 (comment). This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge persistence behavior is influenced by aggregation temporality selection.
  • Add new language stating that metric reader's configurable aggregation temporality as a function of instrument type influences the starting timestamp of metric points. When an instrument is cumulative, its starting timestamp will always be the same. When an instrument is delta, its starting timestamp advances with successive collections. This influence occurs regardless of whether the resulting metric point has an aggregation temporality field. In other words, sync gauge starting timestamp is influenced by aggregation temporality selection.

@jack-berg jack-berg requested review from a team as code owners June 5, 2023 21:52
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan for updating the SDK specification? It would probably not be a good idea to release this change without an SDK specification. Are we planning to block the release until we have one?

specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor

jmacd commented Jun 12, 2023

letting gauge export semantics be influenced by a temporality selection seems incorrect.

Mentioned here,
https://github.com/lightstep/otel-launcher-go/blob/main/lightstep/sdk/metric/README.md#synchronous-gauge-instrument

The Lightstep Metrics SDK has support for an off-spec synchronous gauge achieved by using another synchronous instrument (e.g., histogram) with Gauge as the aggregation (via a "hint"). It defines two behaviors matching the "classical" behavior of a typical Statsd (delta) or Prometheus (cumulative) synchronous gauge, where the gauge value has to be set to be reported vs a gauge value that remains set indefinitely.

This raises the question of whether a Meter can ever mix temporality for synchronous instruments, and how that would be configured. Presently, we have no way to individually customize temporality of a View clause, but gauges seem to be useful both ways--possibly a hint is enough.

@jack-berg
Copy link
Member Author

What is the plan for updating the SDK specification? It would probably not be a good idea to release this change without an SDK specification. Are we planning to block the release until we have one?

I didn't see anything else in the SDK that needs to be updated to accommodate this API addition. Do you have anything in mind?

@jack-berg
Copy link
Member Author

The Lightstep Metrics SDK has support for an off-spec synchronous gauge achieved by using another synchronous instrument (e.g., histogram) with Gauge as the aggregation (via a "hint"). It defines two behaviors matching the "classical" behavior of a typical Statsd (delta) or Prometheus (cumulative) synchronous gauge, where the gauge value has to be set to be reported vs a gauge value that remains set indefinitely.

This behavior is most aligned with existing systems, but current language in the metrics data model and proto description suggest otherwise.

The metrics data model says:

A timestamp when the value was sampled (time_unix_nano)

I've pushed a commit to adjust this to match the timestamp language for the other point types, which simply says the time window (start, end] reflects the interval for which the gauge was calculated. This suggests that if temporality is delta, the start should be the last report time and only series with new measurements are exported. If temporality is cumulative, start should be start of application or time of first measurement, and series should be exported even if no new measurements are received.

The proto gauge description says:

A Gauge does not support different aggregation temporalities. Given the aggregation is unknown, points cannot be combined using the same aggregation, regardless of aggregation temporalities. Therefore, AggregationTemporality is not included. Consequently, this also means “StartTimeUnixNano” is ignored for all data points.

I've opened a draft PR which would change the language around temporality selection. The general aim is to make it more correct to let aggregation temporality influence gauges and the StartTimeUnixNano field.

I think the changes needed to the data model and proto needed to follow this interpretation should be allowed (i.e. they're NOT breaking).

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 23, 2023
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/data-model.md Outdated Show resolved Hide resolved
@quentinmit
Copy link

The Lightstep Metrics SDK has support for an off-spec synchronous gauge achieved by using another synchronous instrument (e.g., histogram) with Gauge as the aggregation (via a "hint"). It defines two behaviors matching the "classical" behavior of a typical Statsd (delta) or Prometheus (cumulative) synchronous gauge, where the gauge value has to be set to be reported vs a gauge value that remains set indefinitely.

This behavior is most aligned with existing systems, but current language in the metrics data model and proto description suggest otherwise.

I'm not sure what @jmacd is getting at with "delta" and "cumulative" here. It's not meaningful to calculate the delta between two gauge points, which I think is why the original language talked talked about not supporting different aggregation temporalities. It sounds like maybe you're using "delta" to refer to "only sends a point when the value has changed" and "cumulative" to refer to "send a point every interval regardless of whether it has changed". AIUI both statsd (spec) and Prometheus will continue reporting the last value of a gauge until a new value is set or the gauge is explicitly cleared.

It's possible someone might want to not report gauge points where the value hasn't changed, but that's not the normal behavior of statsd or Prometheus and it is difficult to consume such a time series downstream. If you want to provide this option (and I don't think it's really worth it) it should be a separate setting from whether counters are reported as delta or cumulative points.

The metrics data model says:

A timestamp when the value was sampled (time_unix_nano)

I've pushed a commit to adjust this to match the timestamp language for the other point types, which simply says the time window (start, end] reflects the interval for which the gauge was calculated. This suggests that if temporality is delta, the start should be the last report time and only series with new measurements are exported. If temporality is cumulative, start should be start of application or time of first measurement, and series should be exported even if no new measurements are received.

What does "gauge was calculated" mean here? In general gauges are sampled, not calculated, so it doesn't make sense to talk about an interval for which they were calculated. Only a small subset of gauges have a window over which they are calculated (e.g. a gauge representing the mean/max of a value over a time window). Even if you include "last" as a possible calculation, that is only meaningful in an SDK context where you know you are aggregating the samples.

The proto gauge description says:

A Gauge does not support different aggregation temporalities. Given the aggregation is unknown, points cannot be combined using the same aggregation, regardless of aggregation temporalities. Therefore, AggregationTemporality is not included. Consequently, this also means “StartTimeUnixNano” is ignored for all data points.

I've opened a draft PR which would change the language around temporality selection. The general aim is to make it more correct to let aggregation temporality influence gauges and the StartTimeUnixNano field.

I think the changes needed to the data model and proto needed to follow this interpretation should be allowed (i.e. they're NOT breaking).

I disagree; there are plenty of gauge points in the wild that (meaningfully) do not have start times, and this data model spec change would make those points invalid.

@github-actions github-actions bot removed the Stale label Jun 24, 2023
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/api.md Outdated Show resolved Hide resolved
specification/metrics/data-model.md Outdated Show resolved Hide resolved
@jack-berg
Copy link
Member Author

Even if you include "last" as a possible calculation, that is only meaningful in an SDK context where you know you are aggregating the samples.

Taking the last is what is being referred to by "calculation". The SDK behavior is big factor of the data model. I don't think this language negatively impacts the interpretation of the data model in other contexts.

I disagree; there are plenty of gauge points in the wild that (meaningfully) do not have start times, and this data model spec change would make those points invalid.

Not quite. As mentioned here StartTimeUnixNano is optional for all point types. This PR just aims to carve out the ability for the aggregation temporality selection to influence the value StartTimeUnixNano (either time of last collection or SDK start time) of gauges produced by synchronous gauge instruments.

@reyang
Copy link
Member

reyang commented Aug 10, 2023

Seeking new reviews from @reyang and @jsuereth since I made change since the original approval. See this comment for a summary.

@jsuereth would you take a look at the updated PR?

@jack-berg
Copy link
Member Author

Will merge this tomorrow morning if there are no additional comments.

@jack-berg jack-berg merged commit d4b241f into open-telemetry:main Aug 16, 2023
7 checks passed
jack-berg added a commit that referenced this pull request Apr 5, 2024
Related to #3893. 

The table has entries for all the instrument types but I forgot to add
an entry for synchronous gauge in #3540. Fixing.
MrAlias added a commit to open-telemetry/opentelemetry-go that referenced this pull request May 16, 2024
Resolve #5225 

The specification has [added a synchronous gauge
instrument](open-telemetry/opentelemetry-specification#3540).
That instrument has now been
[stabilized](open-telemetry/opentelemetry-specification#4019),
and that stabilization is included in the [next
release](open-telemetry/opentelemetry-specification#4034).

This adds the new synchronous gauge instrument to the metric API and all
implementation we publish.

This change will be a breaking change for any SDK developer. The
`embedded` package is updated to ensure our compatibility guarantees are
meet.

---------

Co-authored-by: David Ashpole <dashpole@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for synchronous Gauge and an UpDownCounter that can be Set()
9 participants