[prometheusremotewrite] exporter complaining about temporality in OTEL pipeline #30094

ashishthakur55525 · 2023-12-19T14:39:35Z

Describe the bug
We are using opentelemetry SDK to send metrics to open-telemetry collector and there we have two exporter (otlp) which sends metrics to honeycomb and other (prometheusremotewrite) exporter which write data to local prometheus running on same EKS cluster. Problem is we keep getting temporality errors like below, we worked with Dev team to set temporality to cumulative because prometheus accept that only and we validated its changed but still getting below error. After setting temporality to cumulative we still get this error but for certain point of time we got those metrics in prometheus but very broken state and then stopped again.

2023-12-19T14:14:35.655Z error exporterhelper/queued_retry.go:401 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric "app.counter.apiStatusCode"; invalid temporality and type combination for metric "app.counter.apis"", "dropped_items": 2}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:401
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/metrics.go:136
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/queued_retry.go:205
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
go.opentelemetry.io/collector/exporter@v0.73.0/exporterhelper/internal/bounded_memory_queue.go:60

Steps to reproduce
not really sure.

What did you expect to see?
We should not see these errors and get metric in prometheus.

What did you see instead?
We got drop in metrics, got only broken metric and error is still there.

What version did you use?
Version: v0.73.0 Open-telemetry collector version

What config did you use?
Config:
prometheusremotewrite:
endpoint: 9090/api/v1/write

Environment
OS: (e.g., "Amazon linux, EKS cluster")
Compiler(if manually compiled): (e.g., "go 14.2")

Additional context
Add any other context about the problem here.

bryan-aguilar · 2023-12-19T16:49:30Z

Can you use the debug exporter with detailed verbosity to get more information on app.counter.apiStatusCode? Also, v0.73.0 is a bit dated, could you replicate with a newer version of the collector?

github-actions · 2023-12-19T16:49:54Z

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 · 2023-12-19T17:35:05Z

Note: This is potentially a duplicate of #15281

ashishthakur55525 · 2023-12-20T09:25:12Z

Note: This is potentially a duplicate of #15281

But what was the resolution, i dont see resolution which fixed it for folks.

ashishthakur55525 · 2023-12-20T12:11:23Z

Can you use the debug exporter with detailed verbosity to get more information on app.counter.apiStatusCode? Also, v0.73.0 is a bit dated, could you replicate with a newer version of the collector?

Not sure if I defined debug exporter correctly here but you can check below. And upgraded to 0.89.0 version, after doiing that i dont see those temporality errors again, its just vanished then i enabled debug mode as you suggested where it says temporality is still DELTA, it set to cumulative in code. Can you please guide further?

exporters:
logging: {}
debug:
verbosity: detailed

and under service this is what I added:
metrics/debug:
exporters:
- debug
receivers:
- otlp/pcs-cas. - these are my otlp receivers where I will get data (metrics & trace)

After adding this I could see something in logs (below), i dont know why it still say temporality DELTA, even we confirmed in services logs that its set to cumulative.

StartTimestamp: 2023-12-20 11:58:05.087889 +0000 UTC
Timestamp: 2023-12-20 11:58:35.087889 +0000 UTC
Value: 1
{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2023-12-20T11:58:35.262Z debug memorylimiterprocessor@v0.89.0/memorylimiter.go:273 Currently used memory. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics/pcs-cas", "cur_mem_mib": 235}
2023-12-20T11:58:35.553Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 3, "data points": 5}
2023-12-20T11:58:35.553Z info ResourceMetrics #0
Resource SchemaURL:
Resource attributes:
-> service.namespace: Str()
-> SERVICE_NAME: Str()
-> service.name: Str()
-> service.version: Str(PCS-23.12.1-DR-Test-4289242)
-> stack: Str(app)
-> telemetry.sdk.language: Str(java)
-> telemetry.sdk.name: Str(opentelemetry)
-> telemetry.sdk.version: Str(1.15.0)
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope OtelBeanConfig$$EnhancerBySpringCGLIB$$7045e4df
Metric #0
Descriptor:
-> Name: app.counter.apiStatusCode
-> Description: StatusCode count for API
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Delta
NumberDataPoints #0
Data point attributes:
-> HTTP_STATUS_CODE: Str(200)
-> api-identifier: Str(Health Check)
-> callingService: Str(default)
StartTimestamp: 2023-12-20 11:58:05.552448 +0000 UTC
Timestamp: 2023-12-20 11:58:35.552452 +0000 UTC
Value: 1
Exemplars:
Exemplar #0
-> Trace ID: b65f0a37589d73acb39606ea017ce96b
-> Span ID: 7d43a712802b721c
-> Timestamp: 2023-12-20 11:58:31.482782 +0000 UTC
-> Value: 1
NumberDataPoints #1
Data point attributes:
-> HTTP_STATUS_CODE: Str(200)
-> api-identifier: Str(List Cloud Accounts)
-> callingService: Str(default)
StartTimestamp: 2023-12-20 11:58:05.552448 +0000 UTC
Timestamp: 2023-12-20 11:58:35.552452 +0000 UTC
Value: 1
Exemplars:
Exemplar #0
-> Trace ID: d272ea2e688e2ade2cca4c4c091b8e20
-> Span ID: f1e8c883ec8763bd
-> Timestamp: 2023-12-20 11:58:32.227663 +0000 UTC
-> Value: 1
Metric #1
Descriptor:
-> Name: app.service.counter
-> Description: Services calling CAS APIs
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Delta
NumberDataPoints #0
Data point attributes:
-> SERVICE_CALLED_CAS: Str(pcs-ui-automation+master@company.com)
-> api-identifier: Str(get-cloud-accounts)
-> requested-uri: Str(/cloud)
StartTimestamp: 2023-12-20 11:58:05.552448 +0000 UTC
Timestamp: 2023-12-20 11:58:35.552452 +0000 UTC
Value: 1
Exemplars:
Exemplar #0
-> Trace ID: d272ea2e688e2ade2cca4c4c091b8e20
-> Span ID: f1e8c883ec8763bd
-> Timestamp: 2023-12-20 11:58:32.187731 +0000 UTC
-> Value: 1
Metric #2
Descriptor:
-> Name: app.counter.apis
-> Description: Counts per API
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Delta
NumberDataPoints #0
Data point attributes:
-> api-identifier: Str(get-cloud-accounts)
-> callingService: Str(default)
StartTimestamp: 2023-12-20 11:58:05.552448 +0000 UTC
Timestamp: 2023-12-20 11:58:35.552452 +0000 UTC
Value: 1
Exemplars:
Exemplar #0
-> Trace ID: d272ea2e688e2ade2cca4c4c091b8e20
-> Span ID: f1e8c883ec8763bd
-> Timestamp: 2023-12-20 11:58:32.190339 +0000 UTC
-> Value: 1
NumberDataPoints #1
Data point attributes:
-> api-identifier: Str(health-check)
-> callingService: Str(default)
StartTimestamp: 2023-12-20 11:58:05.552448 +0000 UTC
Timestamp: 2023-12-20 11:58:35.552452 +0000 UTC
Value: 1
Exemplars:
Exemplar #0
-> Trace ID: b65f0a37589d73acb39606ea017ce96b
-> Span ID: 7d43a712802b721c
-> Timestamp: 2023-12-20 11:58:31.482445 +0000 UTC
-> Value: 1
{"kind": "exporter", "data_type": "metrics", "name": "debug"}

bryan-aguilar · 2023-12-27T22:37:09Z

And upgraded to 0.89.0 version, after doiing that i dont see those temporality errors again

So did the error resolve itself after upgrading? Are the metrics present when you query your prometheus server for them? If so, then I think it should be fair to say that something was fixed between v0.73.0 and now.

ashishthakur55525 · 2023-12-29T08:33:41Z

And upgraded to 0.89.0 version, after doiing that i dont see those temporality errors again

So did the error resolve itself after upgrading? Are the metrics present when you query your prometheus server for them? If so, then I think it should be fair to say that something was fixed between v0.73.0 and now.

No actually, the error is gone but still i am not able to see those metrics in Prometheus. Also one more thing as I enabled debug mode right where it says temporality is DELTA but in code we set it up to CUMULATIVE, i dont know where is the miss, what can we do to fix this?

ashishthakur55525 · 2024-01-04T05:14:06Z

@bryan-aguilar any thoughts\suggestion on above?

ashishthakur55525 · 2024-01-09T14:06:14Z

Anyone from this thread has any suggestion\recommendation, please?

crobert-1 · 2024-01-17T23:43:21Z

The promotheus remote write exporter does not support DELTA metrics, as stated in the README. A component has been proposed in the collector to properly handle this situation. I don't believe there's anything that can be done at this time as a workaround, other than what was proposed in the bug I linked earlier.

I'll have to defer to others though in case there's something I'm missing.

ashishthakur55525 · 2024-01-22T02:22:23Z

@crobert-1 which bug you are referring to, so we have made changes to have cumulative metrics only & that error also gone but still why we dont see metrics in prometheus, no error nothing. how we can be make sure its working then?

ceastman-r7 · 2024-02-23T16:13:17Z

@crobert-1 what changed did you make to have cumulative metrics only?

crobert-1 · 2024-02-28T00:31:48Z

@crobert-1 which bug you are referring to, so we have made changes to have cumulative metrics only & that error also gone but still why we dont see metrics in prometheus, no error nothing. how we can be make sure its working then?

The bug I was referencing was in this comment above.

@crobert-1 what changed did you make to have cumulative metrics only?

I believe modifying the solution provided in this comment to your situation may work.

github-actions · 2024-04-29T03:30:00Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ashishthakur55525 added the bug Something isn't working label Dec 19, 2023

mx-psi transferred this issue from open-telemetry/opentelemetry-collector Dec 19, 2023

bryan-aguilar added exporter/prometheusremotewrite needs triage New item requiring triage labels Dec 19, 2023

github-actions bot mentioned this issue Dec 26, 2023

Weekly Report: 2023-12-19 - 2023-12-26 #30206

Closed

github-actions bot mentioned this issue Jan 2, 2024

Weekly Report: 2023-12-26 - 2024-01-02 #30242

Closed

88 tasks

github-actions bot mentioned this issue Jan 9, 2024

Weekly Report: 2024-01-02 - 2024-01-09 #30334

Closed

github-actions bot mentioned this issue Jan 16, 2024

Weekly Report: 2024-01-09 - 2024-01-16 #30565

Closed

crobert-1 added question Further information is requested and removed bug Something isn't working needs triage New item requiring triage labels Jan 17, 2024

Starefossen mentioned this issue Mar 5, 2024

OpenTelemetry Collector is dropping invalid metric "app_currency_counter" open-telemetry/opentelemetry-demo#1432

Closed

github-actions bot added the Stale label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prometheusremotewrite] exporter complaining about temporality in OTEL pipeline #30094

[prometheusremotewrite] exporter complaining about temporality in OTEL pipeline #30094

ashishthakur55525 commented Dec 19, 2023

bryan-aguilar commented Dec 19, 2023 •

edited

github-actions bot commented Dec 19, 2023

crobert-1 commented Dec 19, 2023 •

edited

ashishthakur55525 commented Dec 20, 2023

ashishthakur55525 commented Dec 20, 2023

bryan-aguilar commented Dec 27, 2023

ashishthakur55525 commented Dec 29, 2023

ashishthakur55525 commented Jan 4, 2024

ashishthakur55525 commented Jan 9, 2024 •

edited

crobert-1 commented Jan 17, 2024

ashishthakur55525 commented Jan 22, 2024

ceastman-r7 commented Feb 23, 2024

crobert-1 commented Feb 28, 2024 •

edited

github-actions bot commented Apr 29, 2024

[prometheusremotewrite] exporter complaining about temporality in OTEL pipeline #30094

[prometheusremotewrite] exporter complaining about temporality in OTEL pipeline #30094

Comments

ashishthakur55525 commented Dec 19, 2023

bryan-aguilar commented Dec 19, 2023 • edited

github-actions bot commented Dec 19, 2023

crobert-1 commented Dec 19, 2023 • edited

ashishthakur55525 commented Dec 20, 2023

ashishthakur55525 commented Dec 20, 2023

bryan-aguilar commented Dec 27, 2023

ashishthakur55525 commented Dec 29, 2023

ashishthakur55525 commented Jan 4, 2024

ashishthakur55525 commented Jan 9, 2024 • edited

crobert-1 commented Jan 17, 2024

ashishthakur55525 commented Jan 22, 2024

ceastman-r7 commented Feb 23, 2024

crobert-1 commented Feb 28, 2024 • edited

github-actions bot commented Apr 29, 2024

bryan-aguilar commented Dec 19, 2023 •

edited

crobert-1 commented Dec 19, 2023 •

edited

ashishthakur55525 commented Jan 9, 2024 •

edited

crobert-1 commented Feb 28, 2024 •

edited