-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing the queries for alerts 'CephMgrIsAbsent' and 'CephMgrIsMissingReplicas' #96
Commits on Oct 6, 2021
-
Adding 'namespace' to the 'ceph_node_down' query
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 937f993 - Browse repository at this point
Copy the full SHA 937f993View commit details -
Change CephAbsentMgr to use 'up' query
Instead of using 'absent' query, we are trying to use 'up' which should provide us with the needed 'namespace' field in the resultant metrics Signed-off-by: aruniiird <amohan@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for fcf1565 - Browse repository at this point
Copy the full SHA fcf1565View commit details -
Adding namespace field into other alert queries
Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for e0b35f4 - Browse repository at this point
Copy the full SHA e0b35f4View commit details -
Increasing the auto-resolvable alerts' delay to 15m
The following alerts, CephMonHighNumberOfLeaderChanges CephOSDDiskNotResponding CephClusterWarningState , which are resolved automatically, in most cases, are causing unnecessary admin events. So we are increasing the alert delay time to '15m'. Signed-off-by: aruniiird <amohan@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for ae51e28 - Browse repository at this point
Copy the full SHA ae51e28View commit details -
Reverting the time delay of 'CephMonHighNumberOfLeaderChanges'
Reverting the time delay of 'CephMonHighNumberOfLeaderChanges' back to 5m Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 082c58a - Browse repository at this point
Copy the full SHA 082c58aView commit details -
Bug 1970354: Handle empty ceph_version in ceph_mon_metadata to avoid …
…raising misleading alert Signed-off-by: Gowtham Shanmugasundaram <gshanmug@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for a5fa42d - Browse repository at this point
Copy the full SHA a5fa42dView commit details
Commits on Oct 15, 2021
-
Fixing the queries for alerts 'CephMgrIsAbsent' and 'CephMgrIsMissing…
…Replicas' CephMgrIsAbsent ---------------- This alert initially had the following query absent(up{job="rook-ceph-mgr"}) which will be fired when the 'up' query is not present, but had two flows a. it will not be fired if 'up' provides a result with ZERO value b. it will not give any fields in the metric, so 'namespace' was missing when the above query was replaced with the following, up{job="rook-ceph-mgr"} == 0 query had the following shortage a. whenever mgr pod is completely down (like 'replicas' set to ZERO and 'mgr' is not coming up), 'up' query will not give any result. Thus we had to combine both the queries to get results in both the scenarios. CephMgrIsMissingReplicas ------------------------ This query previously was, sum(up{job="rook-ceph-mgr"}) < 1 had the same structure as the above (Absent) query, but it's intention was to check the no: of 'replicas' count for ceph mgr. Now it is changed to a kube query which handles the replicas count. Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for a32a4c3 - Browse repository at this point
Copy the full SHA a32a4c3View commit details