New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDP probe negative value for failure #741
Comments
@Bashere1 is your success metric also bigger than total? Also, I am assuming you see positive failure Also, do you have just one target? That's what it looks like from the config (vips.json), but wanted to make sure. |
I did a little more digging and it appears that failure is a calculated metic. I also reduced to just a single target dest for testing.
|
Thanks @Bashere1 for additional information. Does failure keep growing to be even more negative over time. I'll take a look at the code to see how can success be larger than total. |
Yes, we do see positive failure delta in some cycles. |
Yes, if all things stay consistent negative values will keep trending down over time. |
Okay. I've been trying to reproduce it, but unsuccessful so far. From your description, it seems you're able to reproduce it pretty consistently, would you say in about an hour? What's the latency between these hosts -- from your comment it seems to be in ~50ms range, is that correct? I'll keep trying to reproduce it. This code has not changed in a long time though, so root causing it will take time. |
I am able to recreate consistently, but it only appears to be from a single src. I'm going to try to reinstall on another host within the same subnet of the problem host to see if the behavior repeats. Is there any debugging that I can enable that would help? |
If it's just a single host, trying on a different host will help. A couple of more questions:
(I am still running the prober to reproduce, but I may not have that scale). |
This issue seems isolated to a single host and since it's such a perplexing issue I don't want to waste you time. |
Thanks @Bashere1 for further testing! I'm going to close this issue then. |
Describe the bug
I am seeing negative values for failure counter metrics for the UDP probe type.
This is reflected both in the prometheus metric output for the failure metric counter and in prober logs.
Linux Version
22.04.2-Ubuntu
Cloudprober Version
v0.13.3
To Reproduce
cloudprober.cfg
`host: "redacted"
vips.json
Steps to reproduce the behavior:
I've been unable to identity the pattern, which causes this behavior.
I have the same config and prober version applied to 50+ other hosts, which do not see negative values for failure rate counter.
This occurs for approx ~5 out of the 50 hosts we have installed prober to, where we see negative values for failure.
If we restart the counters reset and start positive, but will gradually decrement to negative values.
The text was updated successfully, but these errors were encountered: