Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UpDownCounter #2362

Closed
cijothomas opened this issue Sep 17, 2021 · 21 comments
Closed

Support UpDownCounter #2362

cijothomas opened this issue Sep 17, 2021 · 21 comments
Assignees
Labels
enhancement New feature or request metrics need-runtime-change Issues which likely require changes from dotnet runtime - typically DiagnosticSource changes
Milestone

Comments

@cijothomas
Copy link
Member

cijothomas commented Sep 17, 2021

UpDownCounter and its async version are two instruments which are part of the spec, but not currently available in .NET API. This issue is to keep track of supporting it. As Metric API is part of .NET runtime itself, the earliest (if at all) possible window is .NET 7 coming end of 2022.

Why .NET did not add UpDownCounter?

UpDownCounter is typically used for tracking "queue_size", where user would do Add() and Remove() to the instrument. In case of .NET, there are existing tools like Dotnet-Counters, VisualStudio, which has the ability to attach to and collect metrics from a running process. For an instrument like UpDownCounter, unless these tools are attached at startup itself, there is no way for these tools to know the current "queue_size". It can only know the Adds/Removes since the tool was started. This means, the original purpose of the instrument, was not going to be met, when using these tools (which are already part of .NET ecosystem).

Due to this, .NET did not include UpDownCounter in the 1st release (.NET 6). Based on user feedback, this might be added in a future version.

Further reading: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#instrument-selection

@cijothomas cijothomas added enhancement New feature or request metrics need-runtime-change Issues which likely require changes from dotnet runtime - typically DiagnosticSource changes labels Sep 17, 2021
@shaynevanasperen
Copy link

This seems to ignore the fact that some customers use metrics for things other than "queue_size" or process-related things in general. For example, in my stock trading application I'm currently using ApplicationInsights to track profit metrics for each trade that my bot performs. Sometimes my profit is negative, so I track a negative value using TelemetryClient.GetMetric(metricId).TrackValue(profit, dimension1, dimension2, dimension3);.

I'm looking for a way to migrate away from using ApplicationInsights, and my research led me to conclude that the new Metrics API in .NET is the way forwards. Unfortunately, without support for tracking negative values, I'm going to be stuck with ApplicationInsights for far longer than I had hoped.

@cijothomas Can you recommend other alternatives for me? I'm not happy with ApplicationInsights as it appears to be vendor-specific and I'm not getting the functionality I need from it (and the charting in Azure is terrible and keeps breaking). I'd like to move to Grafana Cloud, and not have my metrics stored in Azure at all, but I don't know which way to go now.

@hdost
Copy link

hdost commented Sep 29, 2021

@shaynevanasperen Keep in mind that the metrics API is still not considered stable, but more than likely your equivalent would be something more like the AsynchroniousGauge It's an instrument which allows for tracking of a value.

@noahfalk
Copy link

@shaynevanasperen - Agreed with @hdost. The API names OpenTelemetry settled on differed from the original concept names during design so the API to check out is ObservableGauge

The way that one works is you supply a delegate and OpenTelemetry will invoke your delegate at each reporting interval to get the value you want to report. You can compute your value however you wish in the delegate and it can be positive or negative.

@pcwiese
Copy link
Contributor

pcwiese commented Nov 3, 2021

I am also struggling with the lack of an UpDownCounter. In my case, I have a metric that gets adjusted in multiple places up and down with various dimensions. Unless I choose to create my own structure for maintaining this value and then reference that from the observable gauge callback, I can't think of any way to produce what should be a simple gauge with dimensions. Feels very unnatural for this scenario.

@ktmitton
Copy link
Contributor

@cijothomas I'm trying to do some research in prep for a larger discussion on a separate issue, and was wondering, would a PollingCounter actually be an implementation of UpDownCounter but with a different name?

Here's some documentation that clued me in to this: https://docs.microsoft.com/en-us/dotnet/core/diagnostics/event-counters#net-core-runtime-example-counters

@cijothomas
Copy link
Member Author

PollingCounter is part of the EventCounter API from .NET. The new Metrics API (the one which is based on OpenTelemetry Metric API spec) is not related to that.
See comparison: https://docs.microsoft.com/en-us/dotnet/core/diagnostics/compare-metric-apis

@ejsmith
Copy link
Contributor

ejsmith commented Mar 16, 2022

I'm taking a look at UpDownCounter and wondering if it could be used to show the current queue size in a distributed system. For instance, if I have many processes adding items to a queue and they are all calling Add(1) whenever an item is enqueued and then I also have many processed that each dequeuing 1 entry at a time and calling Add(-1), would all of these measurements across processes be able to be aggregated together to give me the current queue size?

@cijothomas
Copy link
Member Author

all of these measurements across processes be able to be aggregated together to give me the current queue size?

Yes to my knowledge. This should be supported in any metrics backends like Prometheus..

@ejsmith
Copy link
Contributor

ejsmith commented Mar 16, 2022

Ok, interesting. Is there an ability to set a point in time absolute value when it gets out of sync?

@cijothomas
Copy link
Member Author

Ok, interesting. Is there an ability to set a point in time absolute value when it gets out of sync?

I didn't quite understand the question. (Given we don't yet have this in .NET, you might benefit from asking this in https://cloud-native.slack.com/archives/C01NP3BV26R Otel-Metrics slack channel where other language maintainers can give more concrete answers as they already have UpDownCounter..)

@cijothomas
Copy link
Member Author

Update : DiagnosticSource version 7.0, which ships with .NET 7 should contain API support for UpDownCounter. See dotnet/runtime#63648

@ejsmith
Copy link
Contributor

ejsmith commented Mar 16, 2022

@cijothomas yeah, I saw that and it's what prompted me to ask questions here. :-) I tried accessing that slack account and wasn't able to. Is there some other place to ask?

@cijothomas
Copy link
Member Author

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#instrument-selection
You can open issue in the spec repo.

Regd. Slack - I think you need to join the CNCF Slack organization to access the channels..

@alanwest
Copy link
Member

Uncertain whether this would satisfy your scenario, but current queue size in this case may be better tracked with an Async UpDownCounter (also not yet available in .NET yet). You'd need some process that periodically observes the queue size and sends it up.

Though, I suppose you'd lose the ability to add any context to the metrics from individual enqueue/dequeue operations.

@ejsmith
Copy link
Contributor

ejsmith commented Mar 17, 2022

This is a super common scenario. I'd think OTel would have to be able to handle it, but I'm not seeing how by looking at the spec.

@Aaronontheweb
Copy link
Contributor

Is it possible to have a non-Observable Gauge? Or would an UpDownCounter get the job done?

I have millions of actor mailboxes per process whose depth I would like to report upon when those actors are scheduled (only a few thousand at any given second) - using the observable gauge means millions of delegate allocations and presumably, a large number of background tasks.

@ejsmith
Copy link
Contributor

ejsmith commented Mar 24, 2022

@Aaronontheweb I was thinking that same thing. It does seem like a non-observable gauge would be really useful in these scenarios, but surely the Otel people have thought about these common use cases.

@cijothomas
Copy link
Member Author

@Aaronontheweb I was thinking that same thing. It does seem like a non-observable gauge would be really useful in these scenarios, but surely the Otel people have thought about these common use cases.

There were discussions about sync version of Gauge in some Spec Meetings, but it didn't make to the final API. (the API is stable, but can take new instruments.) - I'd suggest to have this conversation in the specification repo, so as to get the right feedback.

@Aaronontheweb
Copy link
Contributor

@cijothomas done - replied here open-telemetry/opentelemetry-specification#2318

@cijothomas
Copy link
Member Author

Update:

https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/7.0.0-preview.3.22175.4 has added UpDownCounter. Will incorporate that into OpenTelemetry SDK/Exporters after 1.2 release is completed.
(We won't be able to release stable version with UpDownCounter until Nov 2022, as that is when DS 7.0 stable is expected)

@cijothomas cijothomas self-assigned this Aug 23, 2022
@cijothomas cijothomas added this to the 1.4.0 milestone Aug 23, 2022
@cijothomas
Copy link
Member Author

Fixed via #3606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request metrics need-runtime-change Issues which likely require changes from dotnet runtime - typically DiagnosticSource changes
Projects
None yet
Development

No branches or pull requests

9 participants