Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why default to summary rather than histogram? #460

Open
sp1rs opened this issue Aug 25, 2022 · 6 comments
Open

Why default to summary rather than histogram? #460

sp1rs opened this issue Aug 25, 2022 · 6 comments

Comments

@sp1rs
Copy link

sp1rs commented Aug 25, 2022

What is the reason behind converting the metric to a summary rather than a histogram by default?

@SuperQ
Copy link
Member

SuperQ commented Aug 25, 2022

Probably historical. A number of Prometheus things from the very early days defaulted to Summary.

@glightfoot
Copy link
Contributor

Hey, that's a good question. Histograms in prometheus have a few main disadvantages that prevent them from being useful for statsd by default. The first and biggest downside is that histograms require some knowledge of what's being measured and the expected distribution in order to set decent bucket boundaries. Imagine you have timings that are expected to measure around a few milliseconds, and another set of timings that cluster around a few seconds. With a generic histogram using the default buckets, neither of these sets of timings would produce accurate data in default histograms. However, if you know these distributions, you can create buckets that will allow you to get meaningful percentiles.

Second, histograms have a higher cardinality than summaries, especially if you try to measure something with a wide distribution of values. Given we don't know what kind of timings people will send, in order to have meaningful histograms by default in the statsd exporter, we'd need a very wide set of buckets. This causes more load on prometheus.

Finally, summaries are accurate and produce meaningful data out of the box for any timing*, regardless of the distribution, since they directly calculate percentiles. Histograms use a linear estimation between bucket boundaries to get a percentile value, which inherently has error baked in that some people don't necessarily consider. This may change once prometheus supports sparse histograms, which significantly improve on these limitations.

  • Assuming there are frequent enough timings being sent to be able to sample them.

TL;DR Summaries are cheaper and more accurate for unknown distributions than histograms, which currently require some knowledge of the expected distribution.

@SuperQ
Copy link
Member

SuperQ commented Aug 25, 2022

The big down side of Summaries is that they can't be aggregated. If you have more than one statsd_exporter receiving data from the same app(s). The data will be essentially useless.

@matthiasr
Copy link
Contributor

I thought about changing the default in the past but never tackled that.

With Histograms v2 in the works, I would rather not change the default now – they will alleviate a lot of the "must pick buckets" pain, and if we can make one breaking change rather than multiple all the better.

@pedro-stanaka
Copy link
Contributor

Now that native histograms are more stable I would +1 here to make this default in the next major release. I have been using it as default and can just recommend the level of detail you get is impressive.

@matthiasr
Copy link
Contributor

They're "more stable" but still experimental 😅 We still need a text format (prometheus/proposals#32), and it's behind a feature flag in Prometheus itself. Let's wait until it is really stable 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants