-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve robustness of ping function and reduce false flags #226
base: dev
Are you sure you want to change the base?
Conversation
Ping a host 3 times with an interval of 100ms and consider it a failure only if there is more than 99% packet loss to that host
heya and thanks for the PR! I like the idea but I'm not really sure what a good threshold would be. I agree that a single lost packet shouldn't mean DOWN right away, but I also think that tolerating 99% loss is way too lenient. we could make the loss% threshold configurable, but I'm not convinced people actually need to change it if the default is reasonable. the only thing left to do is finding a reasonable threshold :b as a heads up, we're still catching up with older issues so getting this released might take a bit longer than expected |
Hey! Thank you for looking into this. I agree, 99% packet loss is quite lenient and it should be changed. Perhaps 50-60% packet loss and 4-5 packets in each test might be a better compromise ? Another option is sending packets every second with a timeout that starts when the first packet is sent and stopping the test when we have the first reply. This should also help in reducing false alarms. |
This would be my view. |
Personally this is intended behaviour, however, it depends if you're using the tool to check for availability issues, or just general uptime overall, I do think changing the functionality would be useful for others as long as previously mentioned this can be configurable and default to the current behaviour. |
ICMP is some times one of the first type of traffic to be dropped if the device/router is under load. This is why, It's not a reliable indicator for uptime or availability issues. (A "TCP Ping" is perhaps more suited). With the current approach, It causes way too many false alarms to be useful in any capacity. The packet loss threshold can be made configurable or personally, I also like the second idea.
|
Then surely you just use the TCP port checking service? |
Yess, I do that where ever I can but not every single service I wanted to monitor exposes a tcp listener I can use for health/availability checks |
Hey,
Currently the ping function in this project pings a host just once and considers it a failure if sent packets != received packets.
This results in a lot of false flags. ICMP echo request packets are some times one of the top most packets that'll be dropped if the router needs room to process some thing else.
This PR changes this behavior and now,
There is the "Notify after failures" setting but that's not quite what I want since these failures are still recorded and show up on the graph.