Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Support of Timers and Meters? #635

Open
AttilaW-Keytiles opened this issue Feb 19, 2023 · 0 comments
Open

[Discussion] Support of Timers and Meters? #635

AttilaW-Keytiles opened this issue Feb 19, 2023 · 0 comments

Comments

@AttilaW-Keytiles
Copy link

AttilaW-Keytiles commented Feb 19, 2023

Hey guys, I'm curious about your opinion about this topic so decided to open a ticket - but for now just for discussing.
Sorry this will be a bit longer...

I'm a Software Architect - mainly Java background. Using typically Prometheus + Grafana combo when it comes to app observability for many years now.
There are mature Metrics libraries in Java and other languages (like Go, Python) which support Timer and Meter implementations can be used by developers very easily (interface is intuitive)
Just for reference: Timer is measuring "how long something takes" while Meter is measuring "event occurence / time unit" like requests/sec. Signature of these is very intuitive - easy to be built in and used by devs.
You can find better / more detailed description in this popular Java libs docs:

Based on my research it looks these "client side calculated" metric types are well supported in all languages (Java, Go, Python, Nodejs, etc) but in C++...

Unfortunately they are not supported by this library and I very very much miss them now as when you have hundreds of "coming and going" developers in a Company it does matter a lot how complicated it is for them to create and maintain a unified observability (=can be scraped and dashboarded with Prom + Grafana) in all services they maintain. So the problem is big!

My main question here is: shouldn't they be supported? Was this ever considered? Was this excluded intentionally? or it was just not really in the focus ever and nobody implemented this...

But what is this problem exactly?

The above mentioned metric types are not part of the core Prometheus metrics concept because they can be solved with the existing minimalistic Metric types + functions (I assume at least this is why) in Prometheus, e.g.

Timer:
You can use a Histogram where Buckets are representing the "how long" part. (This guy summarized it nicely: https://povilasv.me/prometheus-tracking-request-duration/)

Meter:
You can just create a counter, which counts the event occurences then apply rate() function on it

And you are done. Right?
Well, not quite...

Both approaches has serious cons. Just to mention a few:

  • For Meter situation (so rate(<some counter>). If Prometheus scrape interval is once / minute or even less then it is clear Prometheus has no clue what has happened and how during that minute. Increasing the scraping frequency helps but comes with other cons: more disk needed, more network is used etc
  • For Timers the problem is even bigger... It's very difficult to pre-define a good "how long" intervals (so basically: bucket boundaries) fits to any processing time you want to monitor. Therefore it would become "developer responsibility" to choose the buckets appropriately. While we should keep the number of buckets minimal as each bucket is a counter so if you do it too granular (and now imagine hundreds of developers doing it differently) its a huuuge storage + performance impact on Prometheus side.

Due to these reasons it is not a coincidence that the above mentioned Timer and Meter types are supported by metric libraries and

  • they do statistics (distribution, etc) calculations on client side (often with using so called "Resevoir"s - like TimeSlidingWindowReservoir, 1m, 5m, etc - in the background)
  • for Timer based situations this way devs do not need to figure out upfront distribution buckets
  • Prometheus just receives the numerical values - calculated on client side - which very very drastically decreasing storage and performance difficulties

So there are very obvious benefits of implementing such things.

And I also have lots of painful practical experience with Observability if these are not supported but we need to go with Histograms all the time.

I found a cpp port of the Java metric lib I mentioned above
https://github.com/ultradns/cppmetrics

but it is abandoned project as far as I see - maybe should be checked

Conclusion

This is clearly missing. Reason I'm interested in for now.

I would be happy to contribute here with resources (also devs from companies I'm working for) to get this done... unless this topic was intentionally excluded from this library!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant