How do you use Cloudprober? #123

manugarg · 2018-04-25T00:59:12Z

If you use Cloudprober, would you mind sharing how you use it[1]? This will help in two ways:

We'll be able to plan its roadmap better. We'll know what are the features people care about and what we should work on.
It will give us a warm fuzzy feeling and motivate us to do more 😄

[1] - For example, do you use targets discovery? Which monitoring system do you integrate with -- prometheus, stackdriver, or something else entirely? Which probe types? Do you run it in a docker container or just a vanilla binary etc.

dmicanzerofox · 2018-07-22T21:52:48Z

@manugarg I am trying to roll out cloudprober at my org and wrote a bit about it, I"m not sure if it's what you had in mind but hopefully will be able to update with how we use it in the very near future

https://medium.com/@dm03514/sre-availability-probing-101-using-googles-cloudprober-8c191173923c

manugarg · 2018-07-23T19:00:24Z

@dmicanzerofox This is perfect. Thanks for sharing.

paprockiw · 2018-07-25T15:58:27Z

I'm looking into using Cloud Prober as an alternative to Prometheus Blackbox Exporter, since it offers some nice baked-in tools and can be configured to do more. I like it so far, but more documentation and examples would be helpful.

manugarg · 2018-07-26T08:58:06Z

@paprockiw Thanks for the feedback! I'll try to add more documentation and examples.

chinglinwen · 2019-08-02T09:11:32Z

Cloudprober works in more powerful way than sourcegraph's checkup, though ( different project objectives, I think ).

Currently I used Cloudprober to support apps monitoring that not provide ( metrics ).

To continue monitor app status, and have the graph on Grafana.

For one case, I generate a list of config with PING and HTTP check for Kubernetes pod, and service connectivity.

We may plan to deploy distributed node across the big distance of land, to have cloudprober installed, in order to monitor country side monitoring.

I'd wish Cloudprober support api call to update probe item. instead of change config file and do restart.
( issue link: #271 )

manugarg · 2019-08-04T09:42:16Z

Thanks @chinglinwen for sharing how you use Cloudprober.

For one case, I generate a list of config with PING and HTTP check for Kubernetes pod, and service connectivity.

This is interesting. Kubernetes pods and service connectivity monitoring use case keeps coming up. We've been thinking of adding automated targets discovery for Kubernetes resources. I wonder if we should prioritize that. I don't know if you've already seen it but cloudprober has support for automatic discovery of some GCE resources -- GCE instances and forwarding rules, wherein you can specify that you want to probe all VMs or forwarding rules matching a particular regex. As these resources get updated, cloudprober automatically figures out their new IPs and all. We could do the same for Kubernetes resources. Then you'll have to make less changes to the configs.

We may plan to deploy distributed node across the big distance of land, to have cloudprober installed, in order to monitor country side monitoring.

I'd wish Cloudprober support api call to update probe item. instead of change config file and do restart.

I'll comment on the linked issue as well but for the completeness sake, Cloudprober actually comes with an API now to dynamically add/remove probes.
See: https://github.com/google/cloudprober/releases/tag/v0.10.4

I'm working on adding the documentation for the same.

( issue link: #271 )

chinglinwen · 2019-08-04T17:15:24Z

I would like a way to monitor k8s service by Cloudprober. It's a good feature I think.

Also it's good to know that Cloudprober support API now : )

jbkc85 · 2019-10-22T18:21:21Z

The infrastructure I currently manage is not yet containerized and utilizes an older health check scheme. These health checks on 30+ microservices, though using a similar 'OK' or 'SICK' output, are not standardized - meaning each microservice has a different format for health checks. On top of this, the health checks are not necessarily geared towards critical issues as some of the endpoints display feature health (like a third party service) being down. This is all fine and dandy except for the fact that we currently use Nagios for checking these microservices with specific shell scripts. So, that means we have 30+ shell scripts for each microservice being checked by Nagios. Some issues with this:

no customization between what is SICK or OK....can't change intervals of alerting per service
If a feature shows 'SICK', it may actually trigger an alarm w/ Ops even though Ops can't do anything
not scalable
nagios

So we have actually started undertaking a transition to CloudProber to finally rid ourselves of Nagios. We consider it much like a whitebox exporter. We wrote a small modular binary for older projects to call via External Probe which allows us to parse out each health item and report them as individual metrics - which then enables us to have more granular and specific rulesets to avoid death by alerts. On top of this, we have also started encouraging developers who are already writing smoke test executable JARs to write them in such a way that that cloudprober can actually utilize those executable JARs as external probes as well - allowing for continuous smoke test automation into Prometheus.

Finally, I would like to point out we are actually leveraging Consul and Consul-Template for service discovery with CloudProber. Originally we were not able to automate this, but after registering all of our services on Consul we were able to utilize consul-template to do some pattern matching and automated the generation of the cloudprober.cfg file. The only thing I would wish for is service discovery inside of CloudProber itself so we wouldn't have to restart the service everytime.

Hopefully this is helpful!

manugarg · 2019-10-23T18:32:25Z

Thanks a lot @jbkc85. This is really helpful.

Regarding service discovery, cloudprober has support for automated targets discovery but it currently supports only GCE targets and Kubernetes pods. Based on your comments, it looks like it will be helpful to add support for consul catalogs as well. It will look something like this in configs:

probe {
 ...
 targets {
   rds_targets {
    resource_path: "consul://v1/catalog/service/my-service" (or /nodes)
    filter {
      key: "name"
      value: "{{.regex}}"
    }
  }
  ...
}

Does this look useful? I don't have much experience with consul myself, but reading about it, it looks like getting nodes behind a service will be the most useful construct. What do you usually filter by?

jbkc85 · 2019-10-30T03:41:11Z

@manugarg it does - and maybe my use case is a bit more complex...but we do a lot of additional conditional tagging. For example, if the environment:prodeu tag is present, we make a tag for environment='prodeu' exposed in Prometheus through additional_labels.

I also like the idea of separating the service discovery from the actual discovery service due to refresh intervals we ran into in the past. For example, in Prometheus if you use Consul SD, if consul goes down all your SD goes away at the next refresh interval. With consul-template and using basic file discovery, I am able to circumvent a single point of failure incident where my monitoring goes blank...instead I have a sorta 'last known good configuration' to keep monitoring. I would prefer this with probes too :-)

manugarg · 2019-11-15T06:25:56Z

@jbkc85 Sorry, I forgot to respond earlier. Regarding your second point, targets discovery in cloudprober has static failover. If targets discovery fails for some reason (API failure, targets provider unavailability for some reason, etc), cloudprober continues with the old information -- it doesn't update its in-memory database in case of failures.

Regarding the first point, cloudprober targets discovery now supports retrieving labels (for example, GCE instance label or GKE labels) and attaching labels to probe results based on target labels.

Just wanted to make sure that you knew about these features :)

In the meantime, I'll go ahead and file a feature request to look into implementing targets discovery based on consul catalog. This seems like a useful feature regardless.

Thanks again for sharing all the information so far.

mweirauch · 2019-11-15T12:23:39Z

I can't yet contribute to how I use Cloudprober, but as the discussion is centering around discovery:
How about support for the oldest service discovery solution around: DNS?
Prometheus can e.g. retrieve (among others) SRV records for scraping via its dns_sd_config.

manugarg · 2021-10-29T00:23:34Z

Folks, please see the announcement here: #679. Active development of this repository will move to github.com/cloudprober/cloudprober. Unfortunately, we'll have to close all the issues here, and file them again.

manugarg added the question label Apr 25, 2018

manugarg mentioned this issue Jul 11, 2018

Postgres Surfacer #130

Closed

google deleted a comment from vivekbny Sep 7, 2018

manugarg pinned this issue Dec 13, 2018

manugarg mentioned this issue Jul 13, 2019

adding additional labels for an EXTERNAL probe #255

Closed

manugarg added feedback and removed question labels Nov 1, 2019

manugarg closed this as completed Oct 29, 2021

manugarg mentioned this issue Oct 30, 2021

Support DNS SRV discovery cloudprober/cloudprober#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you use Cloudprober? #123

How do you use Cloudprober? #123

manugarg commented Apr 25, 2018

dmicanzerofox commented Jul 22, 2018

manugarg commented Jul 23, 2018

paprockiw commented Jul 25, 2018

manugarg commented Jul 26, 2018

chinglinwen commented Aug 2, 2019

manugarg commented Aug 4, 2019

chinglinwen commented Aug 4, 2019

jbkc85 commented Oct 22, 2019

manugarg commented Oct 23, 2019 •

edited

jbkc85 commented Oct 30, 2019

manugarg commented Nov 15, 2019

mweirauch commented Nov 15, 2019 •

edited

manugarg commented Oct 29, 2021

How do you use Cloudprober? #123

How do you use Cloudprober? #123

Comments

manugarg commented Apr 25, 2018

dmicanzerofox commented Jul 22, 2018

manugarg commented Jul 23, 2018

paprockiw commented Jul 25, 2018

manugarg commented Jul 26, 2018

chinglinwen commented Aug 2, 2019

manugarg commented Aug 4, 2019

chinglinwen commented Aug 4, 2019

jbkc85 commented Oct 22, 2019

manugarg commented Oct 23, 2019 • edited

jbkc85 commented Oct 30, 2019

manugarg commented Nov 15, 2019

mweirauch commented Nov 15, 2019 • edited

manugarg commented Oct 29, 2021

manugarg commented Oct 23, 2019 •

edited

mweirauch commented Nov 15, 2019 •

edited