New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log warning when failing to parse openmetrics response #17514
base: master
Are you sure you want to change the base?
Conversation
7ba302a
to
f64f690
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. |
Test Results 1 046 files 1 046 suites 6h 38m 45s ⏱️ For more details on these failures and errors, see this check. Results for commit 9663a00. ♻️ This comment has been updated with latest results. |
1815a41
to
40ac319
Compare
40ac319
to
9663a00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nevon thanks a bunch for your contribution!!
Apologies for the delay in response. Here are my thoughts.
Let's bake the logging functionality into text_fd_to_metric_families
.
We'd need to make the following changes.
Move the parsing logic as-is to some private function in the same module as text_fd_to_metric_families
, eg _parse_payload
.
Then change text_fd_to_metric_families
to be something along the following lines:
from itertools import tee
...
def text_fd_to_metric_families(fd):
raw_lines, input_lines = tee(fd, n=2)
try:
for raw_line, metric_family in zip(raw_lines, _parse_payloads(input_lines):
yield metric_family
except Exception as e:
raise ValueError("Failed to parse the metric response {}: {}".format(raw_line, e))
A few notes:
- using itertools.tee is less conceputal overhead imo than a custom class
- Let's just raise an exception with the relevant info. This is a crash, logging can be taken care of.
Are you sure? The reason I didn't do that in there, which would have been simpler to do, is because it's a straight up copy from the Python prometheus client, with some minor change to avoid a backwards incompatible change made in the upstream (I didn't look much into it). If I change more things in there it'll become harder to get rid of this technical debt and use the upstream directly. I'm fine with making those changes, but I just want confirmation that you're aware of this before I do. |
@Nevon good point! I still think we can proceed with the "baked in" approach:
|
What does this PR do?
Log a warning when the plain text prometheus metric parsing fails, including the line that it failed to parse. For example:
Motivation
This issue
Currently the
text_fd_to_metric_families
just raises generic errors that don't provide enough context to know what it was that couldn't be parsed, which is problematic when deployed into environments where a single agent is parsing metrics from many different sources via autodiscovery.Additional Notes
I'm very much not a Python developer, so let me know if what I've done is very unidiomatic. I considered wrapping the exception in my own ParserException, but from reading the other checks that didn't seem like a common thing to do. I also wasn't able to actually run the integration tests locally, because they seem to only work with Docker and not docker-compatible clients like nerdctl, but I'm assuming the CI pipeline will.
Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged