Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel Exponential Histogram implementation #3022

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -13,6 +13,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- Support Go 1.19.
Include compatibility testing and document support. (#3077)
- Upgrade go.opentelemetry.io/proto/otlp from v0.18.0 to v0.19.0 (#3107)
- OpenTelemetry Exponential histogram data structure for public use. (#3022)

### Changed

Expand Down
198 changes: 193 additions & 5 deletions sdk/metric/aggregator/exponential/README.md
Expand Up @@ -2,15 +2,48 @@

## Design

This document is a placeholder for future Aggregator, once seen in [PR
2393](https://github.com/open-telemetry/opentelemetry-go/pull/2393).
This is a fixed-size data structure for aggregating the OpenTelemetry
base-2 exponential histogram introduced in [OTEP
149](https://github.com/open-telemetry/oteps/blob/main/text/0149-exponential-histogram.md)
and [described in the metrics data
model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#exponentialhistogram).
The exponential histogram data point is characterized by a `scale`
factor that determines resolution. Positive scales correspond with
more resolution, and negatives scales correspond with less resolution.

Only the mapping functions have been made available at this time. The
equations tested here are specified in the [data model for Exponential
Histogram data points](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#exponentialhistogram).
Given a maximum size, in terms of the number of buckets, the
implementation determines the best scale possible given the set of
measurements received. The size of the histogram is configured using
the `WithMaxSize()` option, which defaults to 160.

The implementation here maintains the best resolution possible. Since
the scale parameter is shared by the positive and negative ranges, the
best value of the scale parameter is determined by the range with the
greater difference between minimum and maximum bucket index:

```golang
func bucketsNeeded(minValue, maxValue float64, scale int32) int32 {
return bucketIndex(maxValue, scale) - bucketIndex(minValue, scale) + 1
}

func bucketIndex(value float64, scale int32) int32 {
return math.Log(value) * math.Ldexp(math.Log2E, scale)
}
```

The best scale is uniquely determined when `maxSize/2 <
bucketsNeeded(minValue, maxValue, scale) <= maxSize`. This
implementation maintains the best scale by rescaling as needed to stay
within the maximum size.

## Layout

### Mapping function

The `mapping` sub-package contains the equations specified in the [data
model for Exponential Histogram data
points](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#exponentialhistogram).

There are two mapping functions used, depending on the sign of the
scale. Negative and zero scales use the `mapping/exponent` mapping
function, which computes the bucket index directly from the bits of
Expand All @@ -25,3 +58,158 @@ function is used with `0 < scale <= 20`. The maximum scale is
selected because at scale 21, simply, it becomes difficult to test
correctness--at this point `math.MaxFloat64` maps to index
`math.MaxInt32` and the `math/big` logic used in testing breaks down.

### Data structure

The `structure` sub-package contains a Histogram aggregator for use by
the OpenTelemetry-Go Metrics SDK as well as OpenTelemetry Collector
receivers, processors, and exporters.

## Implementation

The implementation maintains a slice of buckets and grows the array in
size only as necessary given the actual range of values, up to the
maximum size. The structure of a single range of buckets is:

```golang
type buckets struct {
backing bucketsVarwidth[T] // for T = uint8 | uint16 | uint32 | uint64
indexBase int32
indexStart int32
indexEnd int32
}
```

The `backing` field is a generic slice of `[]uint8`, `[]uint16`,
`[]uint32`, or `[]uint64`.

The positive and negative backing arrays are independent, so the
maximum space used for `buckets` by one `Aggregator` is twice the
configured maximum size.

### Backing array

The backing array is circular. The first observation is counted in
the 0th index of the backing array and the initial bucket number is
stored in `indexBase`. After the initial observation, the backing
array grows in either direction (i.e., larger or smaller bucket
numbers), until rescaling is necessary. This mechanism allows the
histogram to maintain the ideal scale without shifting values inside
the array.

The `indexStart` and `indexEnd` fields store the current minimum and
maximum bucket number. The initial condition is `indexBase ==
indexStart == indexEnd`, representing a single bucket.

Following the first observation, new observations may fall into a
bucket up to `size-1` in either direction. Growth is possible by
adjusting either `indexEnd` or `indexStart` as long as the constraint
`indexEnd-indexStart < size` remains true.

Bucket numbers in the range `[indexBase, indexEnd]` are stored in the
interval `[0, indexEnd-indexBase]` of the backing array. Buckets in
the range `[indexStart, indexBase-1]` are stored in the interval
`[size+indexStart-indexBase, size-1]` of the backing array.

Considering the `aggregation.Buckets` interface, `Offset()` returns
`indexStart`, `Len()` returns `indexEnd-indexStart+1`, and `At()`
locates the correct bucket in the circular array.

### Determining change of scale

The algorithm used to determine the (best) change of scale when a new
value arrives is:

```golang
func newScale(minIndex, maxIndex, scale, maxSize int32) int32 {
return scale - changeScale(minIndex, maxIndex, scale, maxSize)
}

func changeScale(minIndex, maxIndex, scale, maxSize int32) int32 {
var change int32
for maxIndex - minIndex >= maxSize {
maxIndex >>= 1
minIndex >>= 1
change++
}
return change
}
```

The `changeScale` function is also used to determine how many bits to
shift during `Merge`.

### Downscale function

The downscale function rotates the circular backing array so that
`indexStart == indexBase`, using the "3 reversals" method, before
combining the buckets in place.

### Merge function

`Merge` first calculates the correct final scale by comparing the
combined positive and negative ranges. The destination aggregator is
then downscaled, if necessary, and the `UpdateByIncr` code path to add
the source buckets to the destination buckets.

### Scale function

The `Scale` function returns the current scale of the histogram.

If the scale is variable and there are no non-zero values in the
histogram, the scale is zero by definition; when there is only a
single value in this case, its scale is MinScale (20) by definition.

If the scale is fixed because of range limits, the fixed scale will be
returned even for any size histogram.

### Handling subnormal values

Subnormal values are those in the range [0x1p-1074, 0x1p-1022), these
being numbers that "gradually underflow" and use less than 52 bits of
precision in the significand at the smallest representable exponent
(i.e., -1022). Subnormal numbers present special challenges for both
the exponent- and logarithm-based mapping function, and to avoid
additional complexity induced by corner cases, subnormal numbers are
rounded up to 0x1p-1022 in this implementation.

Handling subnormal numbers is difficult for the logarithm mapping
function because Golang's `math.Log()` function rounds subnormal
numbers up to 0x1p-1022. Handling subnormal numbers is difficult for
the exponent mapping function because Golang's `math.Frexp()`, the
natural API for extracting a value's base-2 exponent, also rounds
subnormal numbers up to 0x1p-1022.

While the additional complexity needed to correctly map subnormal
numbers is small in both cases, there are few real benefits in doing
so because of the inherent loss of precision. As secondary
motivation, clamping values to the range [0x1p-1022, math.MaxFloat64]
increases symmetry. This limit means that minimum bucket index and the
maximum bucket index have similar magnitude, which helps support
greater maximum scale. Supporting numbers smaller than 0x1p-1022
would mean changing the valid scale interval to [-11,19] compared with
[-10,20].

### UpdateByIncr interface

The OpenTelemetry metrics SDK `Aggregator` type supports an `Update()`
interface which implies updating the histogram by a count of 1. This
implementation also supports `UpdateByIncr()`, which makes it possible
to support counting multiple observations in a single API call. This
extension is useful in applying `Histogram` aggregation to _sampled_
metric events (e.g. in the [OpenTelemetry statsd
receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/statsdreceiver)).

Another use for `UpdateByIncr` is in a Span-to-metrics pipeline
following [probability sampling in OpenTelemetry tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md) (e.g., for computing an exponential
histogram of probability-sampled span duration).

## Acknowledgements

This implementation is based on work by [Yuke
Zhuge](https://github.com/yzhuge) and [Otmar
Ertl](https://github.com/oertl). See
[NrSketch](https://github.com/newrelic-experimental/newrelic-sketch-java/blob/1ce245713603d61ba3a4510f6df930a5479cd3f6/src/main/java/com/newrelic/nrsketch/indexer/LogIndexer.java)
and
[DynaHist](https://github.com/dynatrace-oss/dynahist/blob/9a6003fd0f661a9ef9dfcced0b428a01e303805e/src/main/java/com/dynatrace/dynahist/layout/OpenTelemetryExponentialBucketsLayout.java)
repositories for more detail.
98 changes: 98 additions & 0 deletions sdk/metric/aggregator/exponential/structure/config.go
@@ -0,0 +1,98 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package structure // import "go.opentelemetry.io/otel/sdk/metric/aggregator/exponential/structure"

import "fmt"

// DefaultMaxSize is the default maximum number of buckets per
// positive or negative number range. The value 160 is specified by
// OpenTelemetry--yields a maximum relative error of less than 5% for
// data with contrast 10**5 (e.g., latencies in the range 1ms to 100s).
// See the derivation here:
// https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exponential-bucket-histogram-aggregation
const DefaultMaxSize int32 = 160

// MinSize is the smallest reasonable configuration, which is small
// enough to contain the entire normal floating point range at
// MinScale.
const MinSize = 2

// MaximumMaxSize is an arbitrary limit meant to limit accidental use
// of giant histograms.
const MaximumMaxSize = 16384
jmacd marked this conversation as resolved.
Show resolved Hide resolved

// Config contains configuration for exponential histogram creation.
type Config struct {
maxSize int32
}

// Option is the interface that applies a configuration option.
type Option interface {
// apply sets the Option value of a config.
apply(Config) Config
}

// WithMaxSize sets the maximum size of each range (positive and/or
// negative) in the histogram.
func WithMaxSize(size int32) Option {
return maxSize(size)
}

// maxSize is an option to set the maximum histogram size.
type maxSize int32

// apply implements Option.
func (ms maxSize) apply(cfg Config) Config {
cfg.maxSize = int32(ms)
return cfg
}

// NewConfig returns an exponential histogram configuration with
// defaults and limits applied.
func NewConfig(opts ...Option) Config {
var cfg Config
for _, opt := range opts {
cfg = opt.apply(cfg)
}
return cfg
}

// Validate returns true for valid configurations.
func (c Config) Valid() bool {
_, err := c.Validate()
return err == nil
}

// Validate returns the nearest valid Config object to the input and a
// boolean indicating whether the the input was a valid
// configurations.
func (c Config) Validate() (Config, error) {
if c.maxSize >= MinSize && c.maxSize <= MaximumMaxSize {
return c, nil
}
if c.maxSize == 0 {
c.maxSize = DefaultMaxSize
return c, nil
}
err := fmt.Errorf("invalid histogram size: %d", c.maxSize)
if c.maxSize < 0 {
c.maxSize = DefaultMaxSize
} else if c.maxSize < MinSize {
c.maxSize = MinSize
} else if c.maxSize > MaximumMaxSize {
c.maxSize = MaximumMaxSize
}
return c, err
}
33 changes: 33 additions & 0 deletions sdk/metric/aggregator/exponential/structure/config_test.go
@@ -0,0 +1,33 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package structure // import "go.opentelemetry.io/otel/sdk/metric/aggregator/exponential/structure"

import (
"testing"

"github.com/stretchr/testify/require"
)

func TestConfigValid(t *testing.T) {
require.True(t, Config{}.Valid())
require.True(t, NewConfig().Valid())
require.True(t, NewConfig(WithMaxSize(MinSize)).Valid())
require.True(t, NewConfig(WithMaxSize(MaximumMaxSize)).Valid())
require.True(t, NewConfig(WithMaxSize((MinSize+MaximumMaxSize)/2)).Valid())

require.False(t, NewConfig(WithMaxSize(-1)).Valid())
require.False(t, NewConfig(WithMaxSize(1<<20)).Valid())
require.False(t, NewConfig(WithMaxSize(1)).Valid())
}