-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitoring: migrate from OpenCensus to OpenTelemetry #45341
Conversation
Skipping CI for Draft Pull Request. |
FYI: I test with opentelemetry-go offline, it's will add |
Currently this isn't really a big difference, just a minor refactoring to use best practices and avoid global/init(). In future (istio#45341) it may be more important.
* agent: avoid global metrics initialization Currently this isn't really a big difference, just a minor refactoring to use best practices and avoid global/init(). In future (#45341) it may be more important. * fix tests
6ecac3b
to
31fc60b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks @howardjohn
@@ -75,6 +75,7 @@ func TestGetFederatedToken(t *testing.T) { | |||
tag := map[string]string{ | |||
"request_type": monitoring.TokenExchange, | |||
} | |||
// TODO: this is broken due to prometheus auto adding "_total" | |||
mt.Assert("num_outgoing_retries", tag, monitortest.AtLeast(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this worth a release notes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment - its not broken anymore. Let me clean this up
For istio#44743 Fixes istio#44513 This PR migrates from OpenCensus to OpenTelemetry for Go metrics. This doesn't impact proxy metrics, or non-metrics observability. These changes are intended to be 100% transparent to end users. The changes are *mostly* transparent from `pkg/monitoring` public interface, with a few changes: * `Register` is no longer needed. Creating the metric registers it. This makes no real difference in any of our usage * MustCreateLabel -> CreateLabel; it is infallable * No need to define the set of keys upfront with the metric. * RegisterIf -> WithEnabled * monitortest behaves slightly different; only metrics recorded after its creation are counted. A few tests are updated to handle this change. * Only one exporter can be defined; this only impacts tests, which are updated to handle this. There are a number of changes, of course, to the internals. Most of this is a fairly mechanical translation. A few quirks: * There is no real Gauge like in OC. Instead, we can only do a callback. So we need to handle a bit of this on our end; not a huge deal, and looks like it will be added eventually. * Setting histogram buckets is a bit awkward currently. This will be addressed eventually upstream.
Looks very good. One concern - not blocking this PR - is that I would prefer if labels are declared at metric creation, even if otel doesn't require it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving the PR - not entirely happy, but it is clearly a huge improvement (opencensus is long deprecated and unsupported).
I think we need to follow up - and remove the wrappers, if we use otel let's use the otel API without an istio abstraction that just confuses people. And add back some info about the labels for each metris - in the description or somewhere, we also probably need to generate docs for each metric.
All of this can be done in separate PRs and incrementally.
I would also add some doc - and make it clear if indeed the intent is to start using otel API and if it is ok for new metrics ( and gradually for the old ones ) to replace the use of the wrapper with the proper API.
@@ -26,16 +26,10 @@ var ( | |||
cniInstalls = monitoring.NewSum( | |||
"istio_cni_installs_total", | |||
"Total number of CNI plugins installed by the Istio CNI installer", | |||
monitoring.WithLabels(resultLabel), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not remove this part ( maybe refactor it to avoid the MustCreateLabel - but having a list of labels is very informative, and will still be required if we want monitoring to have a prometheus implementation.
If we want OTel only ( which I don't mind ) - we don't need an abstraction/wrapper on top of the SDK. But even then it would be good to keep track of what labels are generated, it is probably the most critical aspect of a metric.
|
||
// ReconcileRequestReasonLabel describes reason of reconcile request. | ||
ReconcileRequestReasonLabel = monitoring.MustCreateLabel("reason") | ||
ReconcileRequestReasonLabel = monitoring.CreateLabel("reason") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are they used - I see WithLabels is removed.
|
||
ReconcileRequestTotal, | ||
) | ||
|
||
initOperatorCrdResourceMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it do - isn't it another must register ?
} | ||
|
||
func initializeMonitoring() (*prometheus.Registry, error) { | ||
func initializeMonitoring() (prometheus.Gatherer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're doing this - shouldn't agent also use otel sdk ? Not in this PR, of course - but a TODO and comment explaining why prometheus is used directly would help other developers.
@@ -19,20 +19,15 @@ import ( | |||
) | |||
|
|||
var ( | |||
typeTag = monitoring.MustCreateLabel("type") | |||
eventTag = monitoring.MustCreateLabel("event") | |||
typeTag = monitoring.CreateLabel("type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure why this was renamed - either remove or leave it as is, we can live with 'Must' for a while...
Blocked by open-telemetry/opentelemetry-go#4306. I have ahold
until that is ready; should be fairly quickFor #44743
Fixes #44513
This PR migrates from OpenCensus to OpenTelemetry for Go metrics. This
doesn't impact proxy metrics, or non-metrics observability.
These changes are intended to be 100% transparent to end users.
The changes are mostly transparent from
pkg/monitoring
publicinterface, with a few changes:
Register
is no longer needed. Creating the metric registers it. Thismakes no real difference in any of our usage
its creation are counted. A few tests are updated to handle this
change.
updated to handle this.
There are a number of changes, of course, to the internals. Most of this
is a fairly mechanical translation. A few quirks:
So we need to handle a bit of this on our end; not a huge deal, and
looks like it will be added eventually.
addressed eventually upstream.
This results in a 1% increase in the binary size, which is due to importing both OC and Otel. This is addressed in #46010, which introduces a 1% decrease in binary size when combined with this PR.Recording metrics is now much cheaper. This may allow us to start enabling metrics that were previously disabled (TBD, I am planning a more thorough design of our overall observability, would prefer to avoid changes until then):