Support "set" metric type #183

bryanlarsen · 2019-02-05T19:40:18Z

In statsd, the set type counts unique occurrences between flushes. This is not supported in statsd_exporter because "between flushes" is pretty meaningless in a pull system, especially if there are multiple servers scraping the exporter.

Originally requested in #112

@matthiasr requested opening a new issue if somebody had a good idea on how to implement Set. I wouldn't say that I have a good idea, but I do have some thoughts.

Option 1: assume single scraping server. Not a great solution, but would be sufficient for us, at least at the moment.

Option 2: create a statsd plugin that sends sets as gauges on flush. Requires the use of the statsd daemon.

Option 3: add an option for the flush interval; create a ticker from it that persists and resets the set counts every tick. If the option isn't set it could have a default, or it could just mean the user doesn't need set support.

#3 seems the best option, but #2 is definitely easier and good enough for us. So we'd love to help with #3, but if there's no interest we'd probably just go ahead and do #2 ourselves.

matthiasr · 2019-02-05T20:57:10Z

with option 3, what would the consequence be of setting this interval to a reaaaally long time?

…

On Tue, Feb 5, 2019, 20:40 Bryan Larsen ***@***.***> wrote: In statsd, the set type counts unique occurrences between flushes. This is not supported in statsd_exporter because "between flushes" is pretty meaningless in a pull system, especially if there are multiple servers scraping the exporter. Originally requested in #112 <#112> @matthiasr <https://github.com/matthiasr> requested opening a new issue if somebody had a good idea on how to implement Set. I wouldn't say that I have a good idea, but I do have some thoughts. Option 1: assume single scraping server. Not a great solution, but would be sufficient for us, at least at the moment. Option 2: create a statsd plugin that sends sets as gauges on flush. Requires the use of the statsd daemon. Option 3: add an option for the flush interval; create a ticker from it that persists and resets the set counts every tick. If the option isn't set it could have a default, or it could just mean the user doesn't need set support. #3 <#3> seems the best option, but #2 <#2> is definitely easier and good enough for us. So we'd love to help with #3 <#3>, but if there's no interest we'd probably just go ahead and do #2 <#2> ourselves. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#183>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAICBgo4gKKp6zUuT9ze3wadg-JipTuiks5vKd4jgaJpZM4aj7jz> .

bryanlarsen · 2019-02-05T21:14:08Z

Are you thinking of "reaaaally long time" being the default? At least in our case that wouldn't be an issue. To do the count you'd need a map or a hyperloglog or something to count the unique instances, and I suppose for some people that could grow excessively. Our set sizes are around 100. So a long flush time would just keep counting the same objects repeatedly so no unbounded growth.

statsd uses a Set to count objects rather than a hyperloglog or anything fancy, so they're not worried about unbounded growth.

bryanlarsen · 2019-02-05T21:18:49Z

Better answer: we won't have a count until after the first flush interval expires

matthiasr · 2019-02-06T08:34:15Z

Hmmm, I think I get what sets do now. I'll try to explain it back to make sure:

When sending

foo:123|s
foo:456|s
foo:123|s

then the next time statsd flushes to graphite, it sends foo 2 <timestamp>, and then I send

foo:123|s
foo:456|s
foo:789|s

again, then on the next flush it will send foo 3 <timestamp> to graphite?

In the statsd/graphite setup, how do you aggregate the result over time? Is there a way to turn "uniques per 10s" into "daily active users"?
a common way (that I always recommend) to deploy statsd exporter is to have many of them (one per application instance) – how could one aggregate uniques across them?
Is the flush interval usually aligned on some wall clock time?
What would happen if multiple statsd exporters are not aligned?

matthiasr · 2019-02-06T08:35:23Z

Does statsd accept arbitrary strings as values, or does it have to be a number? An integer or a float?

matthiasr · 2019-02-06T08:40:23Z

From the implementation, I don't see a problem with option 3. I don't think we need hyperloglog or anything. A map[<value type>]struct{} would probably suffice, and then when flushing we take it out, replace it with an empty map, and count the keys. If someone really needs to count billions of objects within seconds this exporter is not the tool for them, I think 😆

My concern is whether this will actually produce a metric that is useful to anyone. The obvious output would be a gauge with the set count, but then I wonder how one would actually use that. And if Prometheus scrapes less frequently than the scrape interval, or the alignments are odd, you'd completely lose the information about a flush interval. Would it make sense to also observe the count for each interval into a histogram?

bryanlarsen · 2019-02-06T14:07:49Z

We're using strings for our sets. Looking at the statsd source code, it appears that they're storing all values as strings.

As for your concern, isn't this the same issue that pretty much every gauge will have? A gauge is typically a continuous signal sampled at a regular period and pushed to statsd. This is then scraped at a different period. So if it's pushed more frequently than it's scraped, samples will be dropped. Annoying, but given that it's a continuous signal, there are infinite number of potential samples that we're necessarily dropping.

In our case, the signal we're measuring with the set is also a continuous signal. It's the number of idle workers. Each worker periodically sends its ID to statsd while it's idle. So we have 3 periods we have to contend with! In our case, the worker reporting period is 10 seconds and the flush interval is 60 seconds, so we can tolerate up to 5 dropped packets. Our scrape interval is 30 seconds (the default; haven't found a reason to tune it yet).

The measurement we care about is the minimum. We don't want to run out of available workers. So yes, if the scrape interval becomes a significant multiple of the flush interval than the dropped samples might be painful.

However, I think this is less of a concern for sets than it would be for other gauge users. Most gauges are easier to sample more frequently than a set is. For example, one could sample a temperature ten times per second.

So I think your concern is quite independent of sets. It'd probably be quite useful to add the ability to histogram gauges, along with sets as a specific type of gauge. But I don't that request belongs under this specific issue.

matthiasr · 2019-02-06T17:40:57Z

Ok, that makes sense, and that's an interesting use case! Do you feel up to implementing this? As I said above, I think a simple map to hold the set will do to start with. To keep things simple for users I would turn it on and set a reasonable default on the flush interval – 1 minute maybe? If there are no recorded sets then there won't be anything to clean up so it won't use measurable resources.

One thing to keep in mind is not to leak too many goroutines when reloading the configuration – one way to do that would be to trigger a last flush on reload and then tear down any flushing routines. (this is a suggestion – if you think there's a better way then to that!)

bryanlarsen · 2019-02-07T15:33:49Z

That's a good question! I don't have anything beyond a superficial exposure to Go, but I don't expect that to be much of a stumbling block. And the time I have to do this sort of thing is fairly limited. As I said #2 would definitely be easier and sufficient for us, but I'd definitely be interested in doing #3 if you're willing to provide guidance.

matthiasr · 2019-02-07T15:50:45Z

Absolutely! My Go is also … functional, at best, but we'll work through it 😄 open a PR early and I can give feedback. If you allow edits from maintainers I can also make changes to it directly, if opportune.

bryanlarsen mentioned this issue Feb 8, 2019

Support for set type #184

Closed

3 tasks

matthiasr changed the title ~~Set type not supported~~ Support "set" metric type Jun 12, 2019

matthiasr added the enhancement label Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support "set" metric type #183

Support "set" metric type #183

bryanlarsen commented Feb 5, 2019

matthiasr commented Feb 5, 2019 via email

bryanlarsen commented Feb 5, 2019

bryanlarsen commented Feb 5, 2019

matthiasr commented Feb 6, 2019

matthiasr commented Feb 6, 2019

matthiasr commented Feb 6, 2019

bryanlarsen commented Feb 6, 2019

matthiasr commented Feb 6, 2019

bryanlarsen commented Feb 7, 2019

matthiasr commented Feb 7, 2019

Support "set" metric type #183

Support "set" metric type #183

Comments

bryanlarsen commented Feb 5, 2019

matthiasr commented Feb 5, 2019 via email

bryanlarsen commented Feb 5, 2019

bryanlarsen commented Feb 5, 2019

matthiasr commented Feb 6, 2019

matthiasr commented Feb 6, 2019

matthiasr commented Feb 6, 2019

bryanlarsen commented Feb 6, 2019

matthiasr commented Feb 6, 2019

bryanlarsen commented Feb 7, 2019

matthiasr commented Feb 7, 2019