Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to workaround duplicate metrics from Stackdriver #103

Open
phnakarin opened this issue Jul 17, 2020 · 14 comments · May be fixed by #319
Open

How to workaround duplicate metrics from Stackdriver #103

phnakarin opened this issue Jul 17, 2020 · 14 comments · May be fixed by #319

Comments

@phnakarin
Copy link

Hello there,

I'm giving it a shot here.

Is there anything we can do other than create a white/blacklist for these trouble maker metrics for the exporter?

For example, this one is from Metric: logging.googleapis.com/log_entry_count, Resource Type: spanner_instance and we see it happen only from US-based projects. From GCP, Metrics explorer, you also see it is not really unique.

* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-001" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:32656 > timestamp_ms:1594973553075 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-002" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:19902 > timestamp_ms:1594973553075 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-003" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:0 > timestamp_ms:1594973433075 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-004" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:28341 > timestamp_ms:1594973553075 } was collected before with the same name and label values

Cordially,
// Nakarin

@phnakarin
Copy link
Author

FYI, I have talked to Google Support and created a support case about these problematic metrics. They admitted that it was a bug from Google Spanner. The Spanner team is fixing this now.

@nrmitchi
Copy link

Adding on here because this is related; I'm seeing the same issues with the stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_backend_request_count metric.

@nrmitchi
Copy link

This also looks to be related to #36

@nrmitchi
Copy link

nrmitchi commented Sep 2, 2020

Update: If anyone else ends up here, this issue (at the current time) does not seem to exist with stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_backend_request_count.

My issue was an accidental multi-inclusion of the metrics based on it matching multiple specified prefixes.

@hanikesn
Copy link

hanikesn commented Feb 9, 2021

Apparently this can be fixed by specifying: --no-collector.fill-missing-labels. Not sure about any other implications.

@bschaeffer
Copy link

Hi everyone... is there a workaround for this?

@mrsimo
Copy link

mrsimo commented Oct 19, 2021

We just started having this problem in the last 24h, and always with the loadbalancing.googleapis .com/https/request_count prefix. Others seem fine. @bschaeffer did you see something similar recently?

@bschaeffer
Copy link

@mrsimo Yes. With load balancing alone and also within the last 24h

@bschaeffer
Copy link

Apparently this can be fixed by specifying: --no-collector.fill-missing-labels. Not sure about any other implications.

This did not work for us

@dohnto
Copy link

dohnto commented Oct 19, 2021

We have the problem as well, affected metrics in our case are:

  • stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_bytes_count
  • stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count
  • stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_response_bytes_count
  • stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_total_latencies

The issue started yesterday 17:30 UTC for us.

@svenmueller
Copy link

I can confirm, We had the same issues starting yesterday and also for some hours today.

@uts09
Copy link

uts09 commented Aug 1, 2023

Has anyone found a solution to this problem? I am getting this error while fetching Bigquery metrics.

Pokom added a commit to Pokom/stackdriver_exporter that referenced this issue Nov 3, 2023
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166
Pokom added a commit to Pokom/stackdriver_exporter that referenced this issue Nov 3, 2023
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166
Pokom added a commit to Pokom/stackdriver_exporter that referenced this issue Nov 3, 2023
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166

Signed-off-by: pokom <mark.poko@grafana.com>
@Packetslave
Copy link

We're seeing the same issue with the metric stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch when trying to ingest metrics on Cloud Monitoring synthetics monitors. Our config:

--monitoring.metrics-ingest-delay
--monitoring.metrics-interval=5m
--monitoring.metrics-offset=0s
--monitoring.metrics-type-prefixes=monitoring.googleapis.com/uptime_check,cloudfunctions.googleapis.com/function

@Packetslave
Copy link

Packetslave commented Mar 5, 2024

Seems like ours is an issue with any metric with the stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_ prefix. One thing I note is that the revision_name label in the erroring metrics is blank.

* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665370000 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665310000 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665200000 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665130000 } was collected before with the same name and label values

kgeckhart pushed a commit to kgeckhart/stackdriver_exporter that referenced this issue Mar 15, 2024
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer #2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166

Signed-off-by: pokom <mark.poko@grafana.com>
@sysedwinistrator sysedwinistrator linked a pull request Mar 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants