Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stackdriver_exporter): Add ErrorLogger for promhttp #277

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Pokom
Copy link
Contributor

@Pokom Pokom commented Nov 3, 2023

I had recently experienced #103 and #166 in production and it took quite some time to recognize there was a problem with stackdriver_exporter because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl /metrics to get results. Grafana Agent however was getting errors when scraping, specifically errors like so:

 [from Gatherer #2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values

To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct customPromErrorLogger that implements the promhttp.http.Logger interface. There is a new flag: monitoring.enable-promhttp-custom-logger which if it is set to true, then we create an instance of customPromErrorLogger and use it as the value for ErrorLogger in promhttp.Handler{}. Otherwise, stackdriver_exporter works as it did before and does not log out errors collectoing metrics.

I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166

Signed-off-by: pokom <mark.poko@grafana.com>
@Pokom Pokom force-pushed the feat/add-custom-logger-to-prom-handler branch from 04af087 to 29e8667 Compare November 3, 2023 14:05
Copy link
Contributor

@kgeckhart kgeckhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

return promhttp.HandlerFor(gatherers, promhttp.HandlerOpts{})
opts := promhttp.HandlerOpts{}
if *monitoringEnablePromHttpCustomLogger {
h.logger.Log("msg", "Enabling custom logger for promhttp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
h.logger.Log("msg", "Enabling custom logger for promhttp")
level.Info(h.logger).Log("msg", "Enabling custom logger for promhttp")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuperQ I just pushed a change to fix this. Please take a look when you get a chance, thank you!

@Pokom Pokom marked this pull request as ready for review November 3, 2023 21:06
@Pokom
Copy link
Contributor Author

Pokom commented Nov 10, 2023

@SuperQ when you get a chance, mind providing a review? This would really helpful for us to at least get alerted on when we enter this state.

return promhttp.HandlerFor(gatherers, promhttp.HandlerOpts{})
opts := promhttp.HandlerOpts{}
if *monitoringEnablePromHttpCustomLogger {
h.logger.Log("msg", "Enabling custom logger for promhttp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs to be fixed.

Signed-off-by: pokom <mark.poko@grafana.com>
@Pokom Pokom force-pushed the feat/add-custom-logger-to-prom-handler branch from 1e43528 to c1559c4 Compare March 18, 2024 13:47
@Pokom Pokom requested a review from SuperQ March 18, 2024 13:48
@@ -236,7 +240,14 @@ func (h *handler) innerHandler(filters map[string]bool) http.Handler {
}

// Delegate http serving to Prometheus client library, which will call collector.Collect.
return promhttp.HandlerFor(gatherers, promhttp.HandlerOpts{})
opts := promhttp.HandlerOpts{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to:

Suggested change
opts := promhttp.HandlerOpts{}
opts := promhttp.HandlerOpts{ErrorLog: stdlog.New(log.NewStdlibAdapter(level.Error(h.logger)), "", 0)}

There's no need to have a new flag for this, just adding the ErrorLog handler to promhttp is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants