New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go metrics are tightly coupled with used go version #994
Comments
@prymitive Thanks for opening the issue. The raw metric values of a histogram haven't designed to be compared. Expectation is to use PromQL queries to handle the calculations. The buckets are dynamically generated, and this is expected. I don't see any action items here, do you have any suggestions? |
If buckets are dynamically generated then is it possible to get different set of buckets on each scrape? |
I think this metric comes from runtime/metrics, and the Float64Histogram type states
Which I guess implies the buckets can change from instance to instance. Which I guess means it's not actually appropriate to model as a Prometheus histogram, which expects bucket definitions to be more or less static? |
I think this is indeed problematic. Let's find a solution to this - it would be ideal if we have static buckets incorporated. This can impact cardinality a lot here. |
cc @mknyszek for awareness. |
The buckets changed because I discovered they were very incorrect. I don't expect this to happen often (or again, even). See golang/go#50732. |
But, yes, the API is specified such that the buckets can change. If that doesn't work for Prometheus, then the bucket-choosing algorithm needs to pick its own buckets on the Prometheus side and aggregate the runtime/metrics buckets into them, with some error. This is somewhat surprising to me because these histograms (at least on the runtime/metrics side) are intended to be used to look at and compare distributions. You can do that even if the buckets are different. |
How can histograms with disparate bucketing be meaningfully aggregated and compared? |
There are many ways to do this based on how you want things visualized. Fundamentally histograms approximate some underlying distribution, a histogram is just a reasonably efficient way to represent a distribution. So you can extract and look at the distribution instead of the buckets directly. For example, you could take two histograms with different bucketing, compute a CDF from both, and plot them on the same chart. If you are willing to make some distributional assumptions, such as a normal or geometric distribution, you can also summarize the histogram with percentiles. In this sense, histograms are somewhat low-level. The Go runtime exposes a time histogram with a pretty fine granularity (which is currently squashed in the Prometheus code to reduce metric cardinality) as a stepping-stone toward producing more robust analyses. There have been enough issues about this that perhaps this is just the wrong Prometheus data structure to represent the Go runtime histograms. We could use a summary instead, or some other collection of metrics. |
Yea, I think we already extrapolate buckets to have less of them, so the easiest thing is to extrapolate to stable buckets (A). Moving to summaries is not bad either, just have some instrumentation overhead, plus it is harder to use later on (not aggregatable) (B) We could do A and hope for better histograms in Prometheus (feature that should be there soon) |
Summaries are not aggregatable, indeed, but that might not be so bad in the case of GC (where I assume you are mostly interested in individual processes, not necessarily many in aggregate). But Summaries also have static precalculated quantiles, which you cannot change in retrospect. (E.g. they are always the 99th perc, 90th perc, and median over the last 10m. If you later decide you would like to see the 75th percentile over the last 5m, you are lost.) In summary (no pun intended), I find Histograms quite interesting here, but they are also very expensive and inaccurate. The new histograms will help with that. Maybe you can make all of this configurable for now (I guess making these expensive histograms optional was considered anyawy). Or just decide what the least best solution is. |
Hello 👋 Looks like there was no activity on this issue for the last 3 months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 |
I've upgraded from Go 1.17.7 to 1.17.8 and I've noticed that exported metrics labels changed as a result.
le label on some histogram uses now different values than it used to:
It's not a huge problem but it makes it harder to compare metrics between 2 instances of the same service compiled with different Go version. This is likely related to #967
The text was updated successfully, but these errors were encountered: