Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] error substring not found in metrics scraper #22510

Open
aexvir opened this issue Feb 2, 2024 · 1 comment
Open

[BUG] error substring not found in metrics scraper #22510

aexvir opened this issue Feb 2, 2024 · 1 comment

Comments

@aexvir
Copy link

aexvir commented Feb 2, 2024

Agent Environment
version: 7.49.1
os: linux (container os)
cloud: gcp
orchestrator: kubernetes

Describe what happened:
during the night some of our pods would stop being scraped for metrics, and on the fleet management ui I can see some broken openmetrics integrations, all of which show the same error: substring not found in prometheus_client/parser.py function _parse_sample

full stacktrace
Error: substring not found
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/prometheus_client/parser.py", line 115, in _parse_sample
label_start, label_end = text.index("{"), text.rindex("}")
ValueError: substring not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/base.py", line 1235, in run
self.check(instance)
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/openmetrics/base_check.py", line 142, in check
self.process(scraper_config)
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 581, in process
for metric in self.scrape_metrics(scraper_config):
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 542, in scrape_metrics
for metric in self.parse_metric_family(response, scraper_config):
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 466, in parse_metric_family
for metric in text_fd_to_metric_families(input_gen):
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/libs/prometheus.py", line 76, in text_fd_to_metric_families
sample = _parse_sample(line)
File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/prometheus_client/parser.py", line 130, in _parse_sample
name_end = text.index(separator)
ValueError: substring not found

Describe what you expected:
metrics

Steps to reproduce the issue:
not sure what on our metrics could be triggering this, can't see that in the ui and we don't have logs persisted at the moment
there has been no significant release on our end in the past two days, when this issue started happening

@aexvir
Copy link
Author

aexvir commented Feb 2, 2024

small update here, issue found, we had a couple metrics with a tag that could be specified by end users, and some unneeded garbage ended up causing this; we've added validation for this, but I still think that if a metric fails to parse the other ones should still make it, so there probably should be some work to handle this more gracefully and avoid breaking the whole integration

we suspect it's probably the following line

api_hits{endpoint="/something",market="'\"()$@--%23"}

we've gotten to points where only 10% of our pods were being scraped because of this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant