Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing recoveries #15968

Open
rudybroersma opened this issue Apr 23, 2024 · 1 comment
Open

Missing recoveries #15968

rudybroersma opened this issue Apr 23, 2024 · 1 comment

Comments

@rudybroersma
Copy link
Contributor

rudybroersma commented Apr 23, 2024

The problem

I'm having an issue where sometimes a device recovery isn't issued through the Transport method.
Possibly a race condition due to the delay we have set.

For example, I have a device that went down at around 12:00. We have a 10 minute delay configured for this rule.
At 12:11 the device came back online. The 'Down notification' was send at 12:11. The recovery notification was never issued.

We have max alerts set to 1 as well.

afbeelding

(meanwhile it's past 16.00 and still no recovery issued)

Output of ./validate.php

-

What was the last working version of LibreNMS?

No response

Anything in the logs that might be useful for us?

There are other reports of this occuring:

https://community.librenms.org/t/alert-rule-no-recovery-alert-with-max-alert-1/22925
https://community.librenms.org/t/missing-recoveries/23858 (mine, but no replies)
@rudybroersma
Copy link
Contributor Author

Still getting bitten by this issue, could the cause of this issue be:

LibreNMS/Alert/RunAlerts.php line 475 to #476

The database is updated after the issueAlert() process has finished. If this issueAlert() takes a few seconds, a recovery that occurs while issueAlert() is running doesn't see the active alert in the database and thus does not send a recovery alert.

This is because in:

https://github.com/librenms/librenms/blob/51c670f109786ba9c9d23f2ce19d3ad5d4de0ba3/LibreNMS/Alert/RunAlerts.php#L393C1-L398C14

noiss is set to true when no alert has been found.

Could we swap lines 475 and 476?

rudybroersma added a commit to rudybroersma/librenms that referenced this issue Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant