Duplicate metrics from Stackdriver for stackdriver_pubsub_topic_pubsub_googleapis_com_topic metrics #166

eafzali · 2022-07-29T11:04:29Z

For a while our prometheus was not able to scrape any metrics from the stackdriver_export with the following error:

[from Gatherer #2] collected metric "stackdriver_pubsub_topic_pubsub_googleapis_com_topic_send_request_latencies" { label:<name:"has_ordering" value:"false" > label:<name:"project_id" value:"" > label:<name:"response_code" value:"success" > label:<name:"schema_type" value:"none" > label:<name:"topic_id" value:"re.adx_valuation_sampling_requests.v1" > label:<name:"unit" value:"us" > histogram:<sample_count:4206 sample_sum:1.1320835231589793e+08 bucket:<cumulative_count:0 upper_bound:1.19209 > bucket:<cumulative_count:0 upper_bound:1.4305080000000001 > bucket:<cumulative_count:0 upper_bound:1.7166096000000002 > bucket:<cumulative_count:0 upper_bound:2.05993152 > bucket:<cumulative_count:0 upper_bound:2.471917824 > bucket:<cumulative_count:0 upper_bound:2.9663013888000003 > bucket:<cumulative_count:0 upper_bound:3.55956166656 > bucket:<cumulative_count:0 upper_bound:4.271473999872 > bucket:<cumulative_count:0 upper_bound:5.1257687998464 > bucket:<cumulative_count:0 upper_bound:6.15092255981568 > bucket:<cumulative_count:0 upper_bound:7.381107071778816 > bucket:<cumulative_count:0 upper_bound:8.857328486134579 > bucket:<cumulative_count:0 upper_bound:10.628794183361494 > bucket:<cumulative_count:0 upper_bound:12.754553020033793 > bucket:<cumulative_count:0 upper_bound:15.30546362404055 > bucket:<cumulative_count:0 upper_bound:18.36655634884866 > bucket:<cumulative_count:0 upper_bound:22.03986761861839 > bucket:<cumulative_count:0 upper_bound:26.447841142342067 > bucket:<cumulative_count:0 upper_bound:31.737409370810482 > bucket:<cumulative_count:0 upper_bound:38.084891244972574 > bucket:<cumulative_count:0 upper_bound:45.701869493967095 > bucket:<cumulative_count:0 upper_bound:54.84224339276051 > bucket:<cumulative_count:0 upper_bound:65.8106920713126 > bucket:<cumulative_count:0 upper_bound:78.97283048557513 > bucket:<cumulative_count:0 upper_bound:94.76739658269014 > bucket:<cumulative_count:0 upper_bound:113.72087589922819 > bucket:<cumulative_count:0 upper_bound:136.46505107907382 > bucket:<cumulative_count:0 upper_bound:163.7580612948886 > bucket:<cumulative_count:0 upper_bound:196.5096735538663 > bucket:<cumulative_count:0 upper_bound:235.81160826463955 > bucket:<cumulative_count:0 upper_bound:282.97392991756743 > bucket:<cumulative_count:0 upper_bound:339.5687159010809 > bucket:<cumulative_count:0 upper_bound:407.482459081297 > bucket:<cumulative_count:0 upper_bound:488.9789508975564 > bucket:<cumulative_count:0 upper_bound:586.7747410770677 > bucket:<cumulative_count:0 upper_bound:704.1296892924812 > bucket:<cumulative_count:0 upper_bound:844.9556271509774 > bucket:<cumulative_count:0 upper_bound:1013.946752581173 > bucket:<cumulative_count:0 upper_bound:1216.7361030974075 > bucket:<cumulative_count:0 upper_bound:1460.083323716889 > bucket:<cumulative_count:0 upper_bound:1752.0999884602668 > bucket:<cumulative_count:0 upper_bound:2102.51998615232 > bucket:<cumulative_count:0 upper_bound:2523.023983382784 > bucket:<cumulative_count:0 upper_bound:3027.6287800593404 > bucket:<cumulative_count:0 upper_bound:3633.154536071209 > bucket:<cumulative_count:0 upper_bound:4359.785443285451 > bucket:<cumulative_count:1 upper_bound:5231.74253194254 > bucket:<cumulative_count:52 upper_bound:6278.091038331048 > bucket:<cumulative_count:293 upper_bound:7533.709245997257 > bucket:<cumulative_count:1067 upper_bound:9040.45109519671 > bucket:<cumulative_count:2198 upper_bound:10848.54131423605 > bucket:<cumulative_count:2904 upper_bound:13018.249577083261 > bucket:<cumulative_count:3394 upper_bound:15621.899492499913 > bucket:<cumulative_count:3777 upper_bound:18746.279390999895 > bucket:<cumulative_count:4029 upper_bound:22495.535269199874 > bucket:<cumulative_count:4097 upper_bound:26994.642323039843 > bucket:<cumulative_count:4133 upper_bound:32393.57078764781 > bucket:<cumulative_count:4154 upper_bound:38872.28494517738 > bucket:<cumulative_count:4161 upper_bound:46646.741934212856 > bucket:<cumulative_count:4164 upper_bound:55976.090321055424 > bucket:<cumulative_count:4166 upper_bound:67171.3083852665 > bucket:<cumulative_count:4167 upper_bound:80605.5700623198 > bucket:<cumulative_count:4168 upper_bound:96726.68407478376 > bucket:<cumulative_count:4169 upper_bound:116072.02088974051 > bucket:<cumulative_count:4170 upper_bound:139286.4250676886 > bucket:<cumulative_count:4170 upper_bound:167143.71008122628 > bucket:<cumulative_count:4170 upper_bound:200572.45209747157 > bucket:<cumulative_count:4170 upper_bound:240686.9425169659 > bucket:<cumulative_count:4170 upper_bound:288824.33102035907 > bucket:<cumulative_count:4171 upper_bound:346589.19722443086 > bucket:<cumulative_count:4171 upper_bound:415907.036669317 > bucket:<cumulative_count:4172 upper_bound:499088.4440031804 > bucket:<cumulative_count:4174 upper_bound:598906.1328038165 > bucket:<cumulative_count:4175 upper_bound:718687.3593645798 > bucket:<cumulative_count:4175 upper_bound:862424.8312374958 > bucket:<cumulative_count:4176 upper_bound:1.0349097974849949e+06 > bucket:<cumulative_count:4179 upper_bound:1.2418917569819938e+06 > bucket:<cumulative_count:4183 upper_bound:1.4902701083783926e+06 > bucket:<cumulative_count:4190 upper_bound:1.7883241300540708e+06 > bucket:<cumulative_count:4191 upper_bound:2.1459889560648855e+06 > bucket:<cumulative_count:4201 upper_bound:2.575186747277862e+06 > bucket:<cumulative_count:4206 upper_bound:3.090224096733434e+06 > bucket:<cumulative_count:4206 upper_bound:3.708268916080121e+06 > bucket:<cumulative_count:4206 upper_bound:4.449922699296146e+06 > bucket:<cumulative_count:4206 upper_bound:5.339907239155375e+06 > bucket:<cumulative_count:4206 upper_bound:6.407888686986448e+06 > bucket:<cumulative_count:4206 upper_bound:7.689466424383738e+06 > bucket:<cumulative_count:4206 upper_bound:9.227359709260486e+06 > bucket:<cumulative_count:4206 upper_bound:1.1072831651112583e+07 > bucket:<cumulative_count:4206 upper_bound:1.32873979813351e+07 > bucket:<cumulative_count:4206 upper_bound:1.594487757760212e+07 > bucket:<cumulative_count:4206 upper_bound:1.9133853093122542e+07 > bucket:<cumulative_count:4206 upper_bound:2.296062371174705e+07 > bucket:<cumulative_count:4206 upper_bound:2.7552748454096463e+07 > bucket:<cumulative_count:4206 upper_bound:3.3063298144915752e+07 > bucket:<cumulative_count:4206 upper_bound:3.96759577738989e+07 > bucket:<cumulative_count:4206 upper_bound:4.7611149328678675e+07 > bucket:<cumulative_count:4206 upper_bound:5.713337919441441e+07 > bucket:<cumulative_count:4206 upper_bound:6.856005503329729e+07 > bucket:<cumulative_count:4206 upper_bound:8.227206603995673e+07 > bucket:<cumulative_count:4206 upper_bound:9.87264792479481e+07 > bucket:<cumulative_count:4206 upper_bound:1.1847177509753771e+08 > bucket:<cumulative_count:4206 upper_bound:1.4216613011704522e+08 > bucket:<cumulative_count:4206 upper_bound:1.705993561404543e+08 > bucket:<cumulative_count:4206 upper_bound:2.0471922736854514e+08 > bucket:<cumulative_count:4206 upper_bound:2.4566307284225416e+08 > bucket:<cumulative_count:4206 upper_bound:2.9479568741070503e+08 > bucket:<cumulative_count:4206 upper_bound:3.537548248928459e+08 > bucket:<cumulative_count:4206 upper_bound:4.245057898714152e+08 > bucket:<cumulative_count:4206 upper_bound:5.0940694784569824e+08 > bucket:<cumulative_count:4206 upper_bound:6.112883374148378e+08 > bucket:<cumulative_count:4206 upper_bound:7.335460048978053e+08 > bucket:<cumulative_count:4206 upper_bound:8.802552058773663e+08 > bucket:<cumulative_count:4206 upper_bound:1.0563062470528396e+09 > bucket:<cumulative_count:4206 upper_bound:1.2675674964634075e+09 > bucket:<cumulative_count:4206 upper_bound:1.5210809957560892e+09 > bucket:<cumulative_count:4206 upper_bound:1.825297194907307e+09 > bucket:<cumulative_count:4206 upper_bound:2.1903566338887677e+09 > bucket:<cumulative_count:4206 upper_bound:inf > > timestamp_ms:1658831700000 } was collected before with the same name and label values

The problem was solved by itself after a while but we would like to get some guidance on could we avoid this to hep again in the future.

This seems to be similar to #103

The text was updated successfully, but these errors were encountered:

I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite some time to recognize there was a problem with `stackdriver_exporter` because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl `/metrics` to get results. Grafana Agent however was getting errors when scraping, specifically errors like so: ``` [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name" value:"REDACTED_FOR_SECURITY"} label:{name:"device_type" value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name" value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"} label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"} counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values ``` To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface. There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it did before and does not log out errors collectoing metrics. - refs prometheus-community#103, prometheus-community#166

I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite some time to recognize there was a problem with `stackdriver_exporter` because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl `/metrics` to get results. Grafana Agent however was getting errors when scraping, specifically errors like so: ``` [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name" value:"REDACTED_FOR_SECURITY"} label:{name:"device_type" value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name" value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"} label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"} counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values ``` To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface. There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it did before and does not log out errors collectoing metrics. - refs prometheus-community#103, prometheus-community#166 Signed-off-by: pokom <mark.poko@grafana.com>

I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite some time to recognize there was a problem with `stackdriver_exporter` because nothing was logged out to indiciate problems gathering metrics. From my perspective, the pod was healthy and online and I could curl `/metrics` to get results. Grafana Agent however was getting errors when scraping, specifically errors like so: ``` [from Gatherer #2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name" value:"REDACTED_FOR_SECURITY"} label:{name:"device_type" value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name" value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"} label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"} counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values ``` To help identify the root cause I've added the ability to opt into logging out errors that come from the handler. Specifically, I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface. There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it did before and does not log out errors collectoing metrics. - refs prometheus-community#103, prometheus-community#166 Signed-off-by: pokom <mark.poko@grafana.com>

Pokom mentioned this issue Nov 3, 2023

fix(docs): Update CONTRIBUTING.md to reflect build process #276

Merged

Pokom mentioned this issue Nov 3, 2023

feat(stackdriver_exporter): Add ErrorLogger for promhttp #277

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate metrics from Stackdriver for stackdriver_pubsub_topic_pubsub_googleapis_com_topic metrics #166

Duplicate metrics from Stackdriver for stackdriver_pubsub_topic_pubsub_googleapis_com_topic metrics #166

eafzali commented Jul 29, 2022

Duplicate metrics from Stackdriver for stackdriver_pubsub_topic_pubsub_googleapis_com_topic metrics #166

Duplicate metrics from Stackdriver for stackdriver_pubsub_topic_pubsub_googleapis_com_topic metrics #166

Comments

eafzali commented Jul 29, 2022