How to workaround duplicate metrics from Stackdriver #103

phnakarin · 2020-07-17T09:10:48Z

Hello there,

I'm giving it a shot here.

Is there anything we can do other than create a white/blacklist for these trouble maker metrics for the exporter?

For example, this one is from Metric: logging.googleapis.com/log_entry_count, Resource Type: spanner_instance and we see it happen only from US-based projects. From GCP, Metrics explorer, you also see it is not really unique.

* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-001" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:32656 > timestamp_ms:1594973553075 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-002" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:19902 > timestamp_ms:1594973553075 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-003" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:0 > timestamp_ms:1594973433075 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"spanner-004" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"my-project-id" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:28341 > timestamp_ms:1594973553075 } was collected before with the same name and label values

Cordially,
// Nakarin

The text was updated successfully, but these errors were encountered:

phnakarin · 2020-07-27T11:08:46Z

FYI, I have talked to Google Support and created a support case about these problematic metrics. They admitted that it was a bug from Google Spanner. The Spanner team is fixing this now.

nrmitchi · 2020-08-19T20:24:10Z

Adding on here because this is related; I'm seeing the same issues with the stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_backend_request_count metric.

nrmitchi · 2020-08-20T16:35:44Z

This also looks to be related to #36

nrmitchi · 2020-09-02T15:19:21Z

Update: If anyone else ends up here, this issue (at the current time) does not seem to exist with stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_backend_request_count.

My issue was an accidental multi-inclusion of the metrics based on it matching multiple specified prefixes.

hanikesn · 2021-02-09T15:49:14Z

Apparently this can be fixed by specifying: --no-collector.fill-missing-labels. Not sure about any other implications.

bschaeffer · 2021-10-19T15:33:45Z

Hi everyone... is there a workaround for this?

mrsimo · 2021-10-19T15:36:29Z

We just started having this problem in the last 24h, and always with the loadbalancing.googleapis .com/https/request_count prefix. Others seem fine. @bschaeffer did you see something similar recently?

bschaeffer · 2021-10-19T15:39:23Z

@mrsimo Yes. With load balancing alone and also within the last 24h

bschaeffer · 2021-10-19T15:55:53Z

Apparently this can be fixed by specifying: --no-collector.fill-missing-labels. Not sure about any other implications.

This did not work for us

dohnto · 2021-10-19T16:19:41Z

We have the problem as well, affected metrics in our case are:

stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_bytes_count
stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count
stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_response_bytes_count
stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_total_latencies

The issue started yesterday 17:30 UTC for us.

svenmueller · 2021-10-19T20:26:57Z

I can confirm, We had the same issues starting yesterday and also for some hours today.

uts09 · 2023-08-01T06:42:45Z

Has anyone found a solution to this problem? I am getting this error while fetching Bigquery metrics.

I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite some time to recognize there was a problem with `stackdriver_exporter` because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl `/metrics` to get results. Grafana Agent however was getting errors when scraping, specifically errors like so: ``` [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name" value:"REDACTED_FOR_SECURITY"} label:{name:"device_type" value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name" value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"} label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"} counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values ``` To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface. There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it did before and does not log out errors collectoing metrics. - refs prometheus-community#103, prometheus-community#166

I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite some time to recognize there was a problem with `stackdriver_exporter` because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl `/metrics` to get results. Grafana Agent however was getting errors when scraping, specifically errors like so: ``` [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name" value:"REDACTED_FOR_SECURITY"} label:{name:"device_type" value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name" value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"} label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"} counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values ``` To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface. There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it did before and does not log out errors collectoing metrics. - refs prometheus-community#103, prometheus-community#166 Signed-off-by: pokom <mark.poko@grafana.com>

Packetslave · 2024-03-05T16:48:22Z

We're seeing the same issue with the metric stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch when trying to ingest metrics on Cloud Monitoring synthetics monitors. Our config:

--monitoring.metrics-ingest-delay
--monitoring.metrics-interval=5m
--monitoring.metrics-offset=0s
--monitoring.metrics-type-prefixes=monitoring.googleapis.com/uptime_check,cloudfunctions.googleapis.com/function

Packetslave · 2024-03-05T19:13:11Z

Seems like ours is an issue with any metric with the stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_ prefix. One thing I note is that the revision_name label in the erroring metrics is blank.

* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665370000 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665310000 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665200000 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_cloud_run_revision_monitoring_googleapis_com_uptime_check_content_mismatch" { label:<name:"check_id" value:"<redacted>" > label:<name:"checked_resource_id" value:"<redacted>" > label:<name:"checker_location" value:"asia-southeast1" > label:<name:"configuration_name" value:"" > label:<name:"location" value:"asia-southeast1" > label:<name:"project_id" value:"<redacted>" > label:<name:"revision_name" value:"" > label:<name:"service_name" value:"<redacted>" > label:<name:"unit" value:"" > gauge:<value:0 > timestamp_ms:1709665130000 } was collected before with the same name and label values

I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite some time to recognize there was a problem with `stackdriver_exporter` because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl `/metrics` to get results. Grafana Agent however was getting errors when scraping, specifically errors like so: ``` [from Gatherer #2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name" value:"REDACTED_FOR_SECURITY"} label:{name:"device_type" value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name" value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"} label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"} counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values ``` To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface. There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it did before and does not log out errors collectoing metrics. - refs prometheus-community#103, prometheus-community#166 Signed-off-by: pokom <mark.poko@grafana.com>

hanikesn mentioned this issue Feb 9, 2021

panic: duplicate label names #85

Open

eafzali mentioned this issue Jul 29, 2022

Duplicate metrics from Stackdriver for stackdriver_pubsub_topic_pubsub_googleapis_com_topic metrics #166

Open

Pokom mentioned this issue Nov 3, 2023

fix(docs): Update CONTRIBUTING.md to reflect build process #276

Merged

Pokom mentioned this issue Nov 3, 2023

feat(stackdriver_exporter): Add ErrorLogger for promhttp #277

Open

sysedwinistrator linked a pull request Mar 19, 2024 that will close this issue

Sanitize metric type prefixes #319

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to workaround duplicate metrics from Stackdriver #103

How to workaround duplicate metrics from Stackdriver #103

phnakarin commented Jul 17, 2020

phnakarin commented Jul 27, 2020

nrmitchi commented Aug 19, 2020

nrmitchi commented Aug 20, 2020

nrmitchi commented Sep 2, 2020

hanikesn commented Feb 9, 2021

bschaeffer commented Oct 19, 2021

mrsimo commented Oct 19, 2021

bschaeffer commented Oct 19, 2021

bschaeffer commented Oct 19, 2021

dohnto commented Oct 19, 2021 •

edited

svenmueller commented Oct 19, 2021

uts09 commented Aug 1, 2023

Packetslave commented Mar 5, 2024

Packetslave commented Mar 5, 2024 •

edited

How to workaround duplicate metrics from Stackdriver #103

How to workaround duplicate metrics from Stackdriver #103

Comments

phnakarin commented Jul 17, 2020

phnakarin commented Jul 27, 2020

nrmitchi commented Aug 19, 2020

nrmitchi commented Aug 20, 2020

nrmitchi commented Sep 2, 2020

hanikesn commented Feb 9, 2021

bschaeffer commented Oct 19, 2021

mrsimo commented Oct 19, 2021

bschaeffer commented Oct 19, 2021

bschaeffer commented Oct 19, 2021

dohnto commented Oct 19, 2021 • edited

svenmueller commented Oct 19, 2021

uts09 commented Aug 1, 2023

Packetslave commented Mar 5, 2024

Packetslave commented Mar 5, 2024 • edited

dohnto commented Oct 19, 2021 •

edited

Packetslave commented Mar 5, 2024 •

edited