Specify optional Exponential Histogram Aggregation, add example code in the data model #2252

jmacd · 2022-01-07T20:30:33Z

Part of #1935.

This protocol was released in OTLP v0.11.

Changes

In open-telemetry/opentelemetry-collector#4642 I introduced temporary support for printing the exponential histogram data point. I used equations added to the data model as examples here.

This document is meant to justify merging the reference implementation shown in this (unfortunately LARGE) PR: open-telemetry/opentelemetry-go#2393

I've proposed to split it into two parts: open-telemetry/opentelemetry-go#2501. The implementation is described in further detail in its README.

The specification changes in this PR describe the critical aspects of this reference implementation in terms of the new Aggregations' two configuration parameters: MaxSize, and RangeLimits (optional) and two requirements for its behavior.

As further motivation, there is a PR to add receiver/statsdreceiver support using this reference implementation, open-telemetry/opentelemetry-collector-contrib#6666, which is waiting for everything above to be merged.

…bnormal values

…uration settings

jmacd · 2022-01-07T20:33:02Z

@MrAlias FYI, this is meant to assist with eventually merging the reference implementation in OTel-Go.

jmacd · 2022-01-07T22:54:55Z

The equations included in this PR are tested in the corresponding OTel-Go PR: open-telemetry/opentelemetry-go#2502

specification/metrics/sdk.md

jmacd · 2022-01-12T16:58:35Z

@beorn7 I would like your feedback on this proposal. To help you consider the options, consider an OpenTelemetry SDK standing in as a Prometheus client. You have perfect control over the histogram behavior: you can choose a fixed scale factor and have variable size, or you can choose range limits, fixed size and fixed scale, and I believe in a Prometheus setting these decisions should be made up front.

As a first attempt, I outlined an exponential histogram aggregator with one mandatory setting (size) and one optional setting (range limits). The user wouldn't ever set scale directly under this proposal. What do you think?

cc/ @brian-brazil

brianbrazil-seczetta · 2022-01-12T17:01:59Z

@brian-brazil?

jmacd · 2022-01-12T17:05:11Z

Yes, thank you @brianbrazil-seczetta. @brian-brazil one day we hope to add you to the OTel organization 😁

beorn7 · 2022-01-12T18:59:47Z

@jmacd Thanks for pinging me. I have trouble finding time to look at this in detail (since I'm working heads-down on the Prometheus histograms, hopefully getting them to a state where less is in flux and I can give better answers to OTel questions ;). I'm not quite sure what your request is here. In Prometheus, the instrumented binary decides what to expose, and then independent scrapers can do with it what they want (including reducing the resolution AKA increasing the bucket width if they want to store at lower resolution).

https://github.com/prometheus/client_golang/blob/70253f4dd027a7128cdd681c22448d65ca30eed7/prometheus/histogram.go#L384-L407 is how to set the scale.

https://github.com/prometheus/client_golang/blob/70253f4dd027a7128cdd681c22448d65ca30eed7/prometheus/histogram.go#L421-L440 sets the strategy how to limit bucket numbers, which, at a first glance, looks similar to what's proposed here.

Does this help? I'm sorry if I missed the point while just skimming this PR. Feel free to ask more specific questions, and I'll try my best in the time given to me.

jmacd · 2022-01-12T20:08:29Z

Thank you @beorn7, very helpful feedback.

Compatibility note: In OTel's protocol and this document Scale is equal to -SparseBucketsFactor in your struct.

I am trying to avoid letting the user set scale directly, because it is difficult to reason about. I was proposing that users either (a) do not set scale, or (b) configure max-size and min/max range limits, which imply a fixed scale. I see that your configuration is more flexible.

For OTel to emulate the behavior implied by your sparse-histogram settings, I will have to make an adjustment in this proposal, in probably two parts:

the range limits described thus far have been all-or-none. You're describing an independent minimum limit and no maximum limit. A user could not emulate what you have using WithRangeLimits(minimum, math.MaxFloat64) because we want a way to set the minimum while keeping adjustable scale (i.e., without setting a maximum).
the option to reset a histogram and use a new start timestamp is already built into the OTel data model, but having a histogram setting to force reset when scale falls below a threshold, that could be useful.

By the way, I see you have SparseBucketsFactor float64 . OTel limited scale to integers so that all histogram conversions are non-lossy, but I admit that's not a strong requirement. The equations in this PR could be updated to compute non-integer scales by replacing scaleFactor = math.Ldexp(math.Log2E, scale) with math.Log2E * math.Exp2(scale) and so on; the mapping functions for non-integer scale perform the same as those documented here, only that the use of non-integer scale admits lossy conversions.

OTel reviewers, if you think having the ability to directly set the scale matters, please say so.

I'll update this PR with point (1) above (i.e. make limits independent) and we can address (2) when the time comes. Thank you!

beorn7 · 2022-01-14T15:21:28Z

Compatibility note: In OTel's protocol and this document Scale is equal to -SparseBucketsFactor in your struct.

Actually not. It's more like the scaleFactor in this PR. The user provides the desired growth factor between one bucket boundary and the next, and then the code picks a scale that will provide the largest growth factor that is equal or smaller than the provided SparseBucketsFactor. For example, SparseBucketsFactor = 1.1 results in a Scale of 3 (2^2^-3 = 1.0905).

I am trying to avoid letting the user set scale directly, because it is difficult to reason about.

Yeah, and that's why we do what's described above. The growth factor gives you a good intuition of the precision the histogram provides.

the range limits described thus far have been all-or-none. You're describing an independent minimum limit and no maximum limit. A user could not emulate what you have using WithRangeLimits(minimum, math.MaxFloat64) because we want a way to set the minimum while keeping adjustable scale (i.e., without setting a maximum).

I guess with the same line of argument that leads to a zero bucket of finite width, one might require an "overflow bucket" and an "underflow bucket" for observations exceeding a configurable max/min value. That would be nicely symmetric, but I guess, the demand hasn't come up in practice because it is relatively easy to "accidentally" create observations very close to zero (due to floating point arithmetic precision issues, or if the observations are coming from actual physical measurements), while I would assume the cases where you accidentally create extremely large observations are much less common.

So far, we have gone for not doing "overflow/underflow buckets", but if someone has relevant need, please let me know.

By the way, I see you have SparseBucketsFactor float64 . OTel limited scale to integers so that all histogram conversions are non-lossy, but I admit that's not a strong requirement. The equations in this PR could be updated to compute non-integer scales by replacing scaleFactor = math.Ldexp(math.Log2E, scale) with math.Log2E * math.Exp2(scale) and so on; the mapping functions for non-integer scale perform the same as those documented here, only that the use of non-integer scale admits lossy conversions.

I think this is all a misunderstanding, see above. In OTel terms, the Prometheus histograms (in the current PoC state) always have an integer scale between -8 and 4, and histograms can always be merged precisely to a histogram with the least common resolution.

jmacd · 2022-01-25T07:24:58Z

By the way, thank you @beorn7 for clarifying: I understand why scaleFactor is a float now, and see no disagreements. :-)

specification/metrics/sdk.md

aabmass · 2022-04-20T18:14:56Z

Specify optional Exponential Histogram Aggregation, add example code in the data model

Is this intended to be optional for SDK authors to implement? I don't see anything in the spec saying so.

…ication into jmacd/sdkexpohisto

Co-authored-by: Aaron Abbott <aaronabbott@google.com>

…pecification into jmacd/sdkexpohisto

Co-authored-by: Aaron Abbott <aaronabbott@google.com>

jmacd · 2022-04-22T16:27:55Z

@reyang The remaining question in this PR gets to a bigger question about handling NaN and Inf values in the Metrics API. Should Exponential Histograms treat NaN and Inf values any differently than Counter instruments. If no, what to do we expect? If yes, what do we expect?

reyang · 2022-04-27T02:23:54Z

@reyang The remaining question in this PR gets to a bigger question about handling NaN and Inf values in the Metrics API. Should Exponential Histograms treat NaN and Inf values any differently than Counter instruments. If no, what to do we expect? If yes, what do we expect?

I think none of these should block this PR.

And I think the API spec doesn't care about NaN/Inf.
The SDK spec has a clear position on NaN/Inf https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#numerical-limits-handling.

Here goes something I did for .NET (and it's still experimental)
https://github.com/open-telemetry/opentelemetry-dotnet/blob/117b70c039a9b012887e3aa29717e2d4cee6274e/test/OpenTelemetry.Exporter.Prometheus.Tests/PrometheusSerializerTests.cs#L359-L362

reyang · 2022-04-29T00:59:21Z

@aabmass @oertl what's your take on #2252 (comment)?

I'm trying to understand your position - do you think we need to discuss more here (and we should not merge the PR before you feel comfortable to sign off), or we are good to merge this PR and have separate conversation about the NaN/Inf/limits? Thanks!

…pecification into jmacd/sdkexpohisto

jmacd · 2022-04-29T15:34:25Z

@reyang I added two commits.

d15ea33 is meant to address a question from Slack about Prometheus' exponential histogram interoperability.

1a473d0 is meant to answer the question you're asking; I believe we have sufficiently general statements about treating Inf and NaN values, however for histogram aggregation I think we want all-or-none behavior, meaning the Sum, Count, Min, Max, and Buckets should be consistent. Since we can't have consistent results with Inf and NaN values because they do not map into valid buckets, I think histogram implementations MUST disregard these. See what you think.

wicha · 2022-05-05T22:33:00Z

Hey Team OpenTelemetry. CTO/Cofounder of a startup that values your work and uses OpenTelemetry here. Is this going to be merged soon? We need this PR merged for some resolution changes we have internally at our company. I'd appreciate if you could merge 🙏

…ication into jmacd/sdkexpohisto

…pecification into jmacd/sdkexpohisto

…ication into jmacd/sdkexpohisto

reyang · 2022-05-10T17:01:16Z

@beorn7 @oertl FYI - we plan to merge this PR by end of the week (May 13th, 2022) unless you still see blocking issues.

@wicha FYI - we'll merge the PR by end of Friday, May 13th, 2022 - unless there is blocking comment(s).

jmacd · 2022-05-10T17:25:02Z

@gouthamve see #2252 (comment)

wicha · 2022-05-12T01:20:22Z

@beorn7 @oertl FYI - we plan to merge this PR by end of the week (May 13th, 2022) unless you still see blocking issues.

@wicha FYI - we'll merge the PR by end of Friday, May 13th, 2022 - unless there is blocking comment(s).

🙏 thank you!

…in the data model (open-telemetry#2252)

jmacd added 3 commits January 7, 2022 12:17

Add more sample code for exponential histogram boundaries; mention su…

8d31b22

…bnormal values

Specify an optional exponential histogram Aggregation with two config…

3e01b83

…uration settings

changelog entry

aefaed5

jmacd requested review from a team as code owners January 7, 2022 20:30

github-actions bot assigned tigrannajaryan Jan 7, 2022

update PR number

698c2e8

Lint

27af2a7

This was referenced Jan 7, 2022

Exponential histogram aggregator & export support open-telemetry/opentelemetry-go#2393

Closed

Exponential Histogram mapping functions for public use open-telemetry/opentelemetry-go#2502

Merged

jmacd added 2 commits January 7, 2022 15:11

clarify & note

f8ae27d

markdownlint

639d565

jsuereth reviewed Jan 10, 2022

View reviewed changes

specification/metrics/sdk.md Outdated Show resolved Hide resolved

specification/metrics/sdk.md Outdated Show resolved Hide resolved

jmacd mentioned this pull request Jan 14, 2022

statsdreceiver: fix start timestamp / temporality for counters open-telemetry/opentelemetry-collector-contrib#5714

Merged

revisions from code review

8ed6d94

jsuereth reviewed Jan 29, 2022

View reviewed changes

specification/metrics/sdk.md Outdated Show resolved Hide resolved

jsuereth approved these changes Jan 31, 2022

View reviewed changes

specification/metrics/sdk.md Show resolved Hide resolved

MrAlias approved these changes Feb 1, 2022

View reviewed changes

aabmass mentioned this pull request Feb 1, 2022

Add ExponentialHistogram data point to metrics SDK open-telemetry/opentelemetry-python#2421

Closed

2 tasks

remove range limits; separate normative text a bit more

590d483

jmacd and others added 6 commits April 20, 2022 22:31

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

36e0947

…ication into jmacd/sdkexpohisto

Apply suggestions from code review

34f6166

Co-authored-by: Aaron Abbott <aaronabbott@google.com>

lint

559a6cb

Merge branch 'jmacd/sdkexpohisto' of github.com:jmacd/opentelemetry-s…

7f46b7c

…pecification into jmacd/sdkexpohisto

Update specification/metrics/sdk.md

a89aba9

Co-authored-by: Aaron Abbott <aaronabbott@google.com>

yzhuge suggestion

be7d875

reyang and others added 4 commits April 28, 2022 18:00

Merge branch 'main' into jmacd/sdkexpohisto

4f62a4c

note on inclusivity

d15ea33

require consistency; do not histogram non-normal values

1a473d0

Merge branch 'jmacd/sdkexpohisto' of github.com:jmacd/opentelemetry-s…

a26fc6e

…pecification into jmacd/sdkexpohisto

Merge branch 'main' into jmacd/sdkexpohisto

e0cf769

jmacd added 4 commits May 5, 2022 15:53

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

bdc6593

…ication into jmacd/sdkexpohisto

Merge branch 'jmacd/sdkexpohisto' of github.com:jmacd/opentelemetry-s…

0b32697

…pecification into jmacd/sdkexpohisto

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

982e015

…ication into jmacd/sdkexpohisto

Changelog.

6d2edef

oertl approved these changes May 13, 2022

View reviewed changes

Merge branch 'main' into jmacd/sdkexpohisto

4b14fed

reyang merged commit 3788987 into open-telemetry:main May 13, 2022

oertl mentioned this pull request Jun 14, 2022

REQUEST: New membership for @oertl open-telemetry/community#1078

Closed

6 tasks

beeme1mr pushed a commit to beeme1mr/opentelemetry-specification that referenced this pull request Aug 31, 2022

Specify optional Exponential Histogram Aggregation, add example code …

b07a1a1

…in the data model (open-telemetry#2252)

beeme1mr pushed a commit to beeme1mr/opentelemetry-specification that referenced this pull request Aug 31, 2022

Specify optional Exponential Histogram Aggregation, add example code …

43521e5

…in the data model (open-telemetry#2252)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify optional Exponential Histogram Aggregation, add example code in the data model #2252

Specify optional Exponential Histogram Aggregation, add example code in the data model #2252

jmacd commented Jan 7, 2022

jmacd commented Jan 7, 2022

jmacd commented Jan 7, 2022

jmacd commented Jan 12, 2022 •

edited

brianbrazil-seczetta commented Jan 12, 2022

jmacd commented Jan 12, 2022

beorn7 commented Jan 12, 2022

jmacd commented Jan 12, 2022

beorn7 commented Jan 14, 2022

jmacd commented Jan 25, 2022

aabmass commented Apr 20, 2022

jmacd commented Apr 22, 2022

reyang commented Apr 27, 2022

reyang commented Apr 29, 2022

jmacd commented Apr 29, 2022

wicha commented May 5, 2022

reyang commented May 10, 2022 •

edited

jmacd commented May 10, 2022

wicha commented May 12, 2022

Specify optional Exponential Histogram Aggregation, add example code in the data model #2252

Specify optional Exponential Histogram Aggregation, add example code in the data model #2252

Conversation

jmacd commented Jan 7, 2022

Changes

jmacd commented Jan 7, 2022

jmacd commented Jan 7, 2022

jmacd commented Jan 12, 2022 • edited

brianbrazil-seczetta commented Jan 12, 2022

jmacd commented Jan 12, 2022

beorn7 commented Jan 12, 2022

jmacd commented Jan 12, 2022

beorn7 commented Jan 14, 2022

jmacd commented Jan 25, 2022

aabmass commented Apr 20, 2022

jmacd commented Apr 22, 2022

reyang commented Apr 27, 2022

reyang commented Apr 29, 2022

jmacd commented Apr 29, 2022

wicha commented May 5, 2022

reyang commented May 10, 2022 • edited

jmacd commented May 10, 2022

wicha commented May 12, 2022

jmacd commented Jan 12, 2022 •

edited

reyang commented May 10, 2022 •

edited