Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate metrics from Stackdriver for stackdriver_pubsub_topic_pubsub_googleapis_com_topic metrics #166

Open
eafzali opened this issue Jul 29, 2022 · 0 comments

Comments

@eafzali
Copy link

eafzali commented Jul 29, 2022

For a while our prometheus was not able to scrape any metrics from the stackdriver_export with the following error:

[from Gatherer #2] collected metric "stackdriver_pubsub_topic_pubsub_googleapis_com_topic_send_request_latencies" { label:<name:"has_ordering" value:"false" > label:<name:"project_id" value:"" > label:<name:"response_code" value:"success" > label:<name:"schema_type" value:"none" > label:<name:"topic_id" value:"re.adx_valuation_sampling_requests.v1" > label:<name:"unit" value:"us" > histogram:<sample_count:4206 sample_sum:1.1320835231589793e+08 bucket:<cumulative_count:0 upper_bound:1.19209 > bucket:<cumulative_count:0 upper_bound:1.4305080000000001 > bucket:<cumulative_count:0 upper_bound:1.7166096000000002 > bucket:<cumulative_count:0 upper_bound:2.05993152 > bucket:<cumulative_count:0 upper_bound:2.471917824 > bucket:<cumulative_count:0 upper_bound:2.9663013888000003 > bucket:<cumulative_count:0 upper_bound:3.55956166656 > bucket:<cumulative_count:0 upper_bound:4.271473999872 > bucket:<cumulative_count:0 upper_bound:5.1257687998464 > bucket:<cumulative_count:0 upper_bound:6.15092255981568 > bucket:<cumulative_count:0 upper_bound:7.381107071778816 > bucket:<cumulative_count:0 upper_bound:8.857328486134579 > bucket:<cumulative_count:0 upper_bound:10.628794183361494 > bucket:<cumulative_count:0 upper_bound:12.754553020033793 > bucket:<cumulative_count:0 upper_bound:15.30546362404055 > bucket:<cumulative_count:0 upper_bound:18.36655634884866 > bucket:<cumulative_count:0 upper_bound:22.03986761861839 > bucket:<cumulative_count:0 upper_bound:26.447841142342067 > bucket:<cumulative_count:0 upper_bound:31.737409370810482 > bucket:<cumulative_count:0 upper_bound:38.084891244972574 > bucket:<cumulative_count:0 upper_bound:45.701869493967095 > bucket:<cumulative_count:0 upper_bound:54.84224339276051 > bucket:<cumulative_count:0 upper_bound:65.8106920713126 > bucket:<cumulative_count:0 upper_bound:78.97283048557513 > bucket:<cumulative_count:0 upper_bound:94.76739658269014 > bucket:<cumulative_count:0 upper_bound:113.72087589922819 > bucket:<cumulative_count:0 upper_bound:136.46505107907382 > bucket:<cumulative_count:0 upper_bound:163.7580612948886 > bucket:<cumulative_count:0 upper_bound:196.5096735538663 > bucket:<cumulative_count:0 upper_bound:235.81160826463955 > bucket:<cumulative_count:0 upper_bound:282.97392991756743 > bucket:<cumulative_count:0 upper_bound:339.5687159010809 > bucket:<cumulative_count:0 upper_bound:407.482459081297 > bucket:<cumulative_count:0 upper_bound:488.9789508975564 > bucket:<cumulative_count:0 upper_bound:586.7747410770677 > bucket:<cumulative_count:0 upper_bound:704.1296892924812 > bucket:<cumulative_count:0 upper_bound:844.9556271509774 > bucket:<cumulative_count:0 upper_bound:1013.946752581173 > bucket:<cumulative_count:0 upper_bound:1216.7361030974075 > bucket:<cumulative_count:0 upper_bound:1460.083323716889 > bucket:<cumulative_count:0 upper_bound:1752.0999884602668 > bucket:<cumulative_count:0 upper_bound:2102.51998615232 > bucket:<cumulative_count:0 upper_bound:2523.023983382784 > bucket:<cumulative_count:0 upper_bound:3027.6287800593404 > bucket:<cumulative_count:0 upper_bound:3633.154536071209 > bucket:<cumulative_count:0 upper_bound:4359.785443285451 > bucket:<cumulative_count:1 upper_bound:5231.74253194254 > bucket:<cumulative_count:52 upper_bound:6278.091038331048 > bucket:<cumulative_count:293 upper_bound:7533.709245997257 > bucket:<cumulative_count:1067 upper_bound:9040.45109519671 > bucket:<cumulative_count:2198 upper_bound:10848.54131423605 > bucket:<cumulative_count:2904 upper_bound:13018.249577083261 > bucket:<cumulative_count:3394 upper_bound:15621.899492499913 > bucket:<cumulative_count:3777 upper_bound:18746.279390999895 > bucket:<cumulative_count:4029 upper_bound:22495.535269199874 > bucket:<cumulative_count:4097 upper_bound:26994.642323039843 > bucket:<cumulative_count:4133 upper_bound:32393.57078764781 > bucket:<cumulative_count:4154 upper_bound:38872.28494517738 > bucket:<cumulative_count:4161 upper_bound:46646.741934212856 > bucket:<cumulative_count:4164 upper_bound:55976.090321055424 > bucket:<cumulative_count:4166 upper_bound:67171.3083852665 > bucket:<cumulative_count:4167 upper_bound:80605.5700623198 > bucket:<cumulative_count:4168 upper_bound:96726.68407478376 > bucket:<cumulative_count:4169 upper_bound:116072.02088974051 > bucket:<cumulative_count:4170 upper_bound:139286.4250676886 > bucket:<cumulative_count:4170 upper_bound:167143.71008122628 > bucket:<cumulative_count:4170 upper_bound:200572.45209747157 > bucket:<cumulative_count:4170 upper_bound:240686.9425169659 > bucket:<cumulative_count:4170 upper_bound:288824.33102035907 > bucket:<cumulative_count:4171 upper_bound:346589.19722443086 > bucket:<cumulative_count:4171 upper_bound:415907.036669317 > bucket:<cumulative_count:4172 upper_bound:499088.4440031804 > bucket:<cumulative_count:4174 upper_bound:598906.1328038165 > bucket:<cumulative_count:4175 upper_bound:718687.3593645798 > bucket:<cumulative_count:4175 upper_bound:862424.8312374958 > bucket:<cumulative_count:4176 upper_bound:1.0349097974849949e+06 > bucket:<cumulative_count:4179 upper_bound:1.2418917569819938e+06 > bucket:<cumulative_count:4183 upper_bound:1.4902701083783926e+06 > bucket:<cumulative_count:4190 upper_bound:1.7883241300540708e+06 > bucket:<cumulative_count:4191 upper_bound:2.1459889560648855e+06 > bucket:<cumulative_count:4201 upper_bound:2.575186747277862e+06 > bucket:<cumulative_count:4206 upper_bound:3.090224096733434e+06 > bucket:<cumulative_count:4206 upper_bound:3.708268916080121e+06 > bucket:<cumulative_count:4206 upper_bound:4.449922699296146e+06 > bucket:<cumulative_count:4206 upper_bound:5.339907239155375e+06 > bucket:<cumulative_count:4206 upper_bound:6.407888686986448e+06 > bucket:<cumulative_count:4206 upper_bound:7.689466424383738e+06 > bucket:<cumulative_count:4206 upper_bound:9.227359709260486e+06 > bucket:<cumulative_count:4206 upper_bound:1.1072831651112583e+07 > bucket:<cumulative_count:4206 upper_bound:1.32873979813351e+07 > bucket:<cumulative_count:4206 upper_bound:1.594487757760212e+07 > bucket:<cumulative_count:4206 upper_bound:1.9133853093122542e+07 > bucket:<cumulative_count:4206 upper_bound:2.296062371174705e+07 > bucket:<cumulative_count:4206 upper_bound:2.7552748454096463e+07 > bucket:<cumulative_count:4206 upper_bound:3.3063298144915752e+07 > bucket:<cumulative_count:4206 upper_bound:3.96759577738989e+07 > bucket:<cumulative_count:4206 upper_bound:4.7611149328678675e+07 > bucket:<cumulative_count:4206 upper_bound:5.713337919441441e+07 > bucket:<cumulative_count:4206 upper_bound:6.856005503329729e+07 > bucket:<cumulative_count:4206 upper_bound:8.227206603995673e+07 > bucket:<cumulative_count:4206 upper_bound:9.87264792479481e+07 > bucket:<cumulative_count:4206 upper_bound:1.1847177509753771e+08 > bucket:<cumulative_count:4206 upper_bound:1.4216613011704522e+08 > bucket:<cumulative_count:4206 upper_bound:1.705993561404543e+08 > bucket:<cumulative_count:4206 upper_bound:2.0471922736854514e+08 > bucket:<cumulative_count:4206 upper_bound:2.4566307284225416e+08 > bucket:<cumulative_count:4206 upper_bound:2.9479568741070503e+08 > bucket:<cumulative_count:4206 upper_bound:3.537548248928459e+08 > bucket:<cumulative_count:4206 upper_bound:4.245057898714152e+08 > bucket:<cumulative_count:4206 upper_bound:5.0940694784569824e+08 > bucket:<cumulative_count:4206 upper_bound:6.112883374148378e+08 > bucket:<cumulative_count:4206 upper_bound:7.335460048978053e+08 > bucket:<cumulative_count:4206 upper_bound:8.802552058773663e+08 > bucket:<cumulative_count:4206 upper_bound:1.0563062470528396e+09 > bucket:<cumulative_count:4206 upper_bound:1.2675674964634075e+09 > bucket:<cumulative_count:4206 upper_bound:1.5210809957560892e+09 > bucket:<cumulative_count:4206 upper_bound:1.825297194907307e+09 > bucket:<cumulative_count:4206 upper_bound:2.1903566338887677e+09 > bucket:<cumulative_count:4206 upper_bound:inf > > timestamp_ms:1658831700000 } was collected before with the same name and label values

The problem was solved by itself after a while but we would like to get some guidance on could we avoid this to hep again in the future.

This seems to be similar to #103

Pokom added a commit to Pokom/stackdriver_exporter that referenced this issue Nov 3, 2023
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166
Pokom added a commit to Pokom/stackdriver_exporter that referenced this issue Nov 3, 2023
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166
Pokom added a commit to Pokom/stackdriver_exporter that referenced this issue Nov 3, 2023
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer prometheus-community#2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166

Signed-off-by: pokom <mark.poko@grafana.com>
kgeckhart pushed a commit to kgeckhart/stackdriver_exporter that referenced this issue Mar 15, 2024
I had recently experienced prometheus-community#103 and prometheus-community#166 in production and it took quite
some time to recognize there was a problem with `stackdriver_exporter`
because nothing was logged out to indiciate problems gathering metrics.
From my perspective, the pod was healthy and online and I could
curl `/metrics` to get results. Grafana Agent however was getting errors
when scraping, specifically errors like so:

```
 [from Gatherer #2] collected metric "stackdriver_gce_instance_compute_googleapis_com_instance_disk_write_bytes_count" { label:{name:"device_name"
value:"REDACTED_FOR_SECURITY"} label:{name:"device_type"  value:"permanent"} label:{name:"instance_id" value:"2924941021702260446"} label:{name:"instance_name"  value:"REDACTED_FOR_SECURITY"} label:{name:"project_id" value:"REDACTED_FOR_SECURITY"}  label:{name:"storage_type" value:"pd-ssd"} label:{name:"unit" value:"By"} label:{name:"zone" value:"us-central1-a"}
counter:{value:0} timestamp_ms:1698871080000} was collected before with the same name and label values
```

To help identify the root cause I've added the ability to opt into
logging out errors that come from the handler. Specifically,
I've created the struct `customPromErrorLogger` that implements the `promhttp.http.Logger` interface.
There is a new flag: `monitoring.enable-promhttp-custom-logger` which if it is set to true, then
we create an instance of `customPromErrorLogger` and use it as the value for ErrorLogger
in `promhttp.Handler{}`. Otherwise, `stackdriver_exporter` works as it
did before and does not log out errors collectoing metrics.

- refs prometheus-community#103, prometheus-community#166

Signed-off-by: pokom <mark.poko@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant