Skip to content

Commit

Permalink
ceph: fixing the queries for alerts 'CephMgrIsAbsent' and 'CephMgrIsM…
Browse files Browse the repository at this point in the history
…issingReplicas'

CephMgrIsAbsent
----------------
This alert initially had the following query

absent(up{job="rook-ceph-mgr"})

which will fire when the 'up' query is not present, but had two flows
  a. it will not be fired if 'up' provides a result with ZERO value
  b. it will not give any fields in the metric, so 'namespace' was missing

when the above query was replaced with the following,

up{job="rook-ceph-mgr"} == 0

query had the following shortage
  a. whenever mgr pod is completely down (like 'replicas' set to ZERO
and 'mgr' is not coming up), 'up' query will not give any result.

Thus we had to combine both the queries to get results in both the scenarios.

CephMgrIsMissingReplicas
------------------------
This query previously was,

sum(up{job="rook-ceph-mgr"}) < 1

had the same structure as the above (Absent) query, but it's
intention was to check the no: of 'replicas' count for ceph mgr.
Now it is changed to a kube query which handles the replicas count.

Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
  • Loading branch information
aruniiird committed Oct 15, 2021
1 parent 9593bdd commit cfa2c2d
Showing 1 changed file with 2 additions and 2 deletions.
Expand Up @@ -42,7 +42,7 @@ spec:
severity_level: critical
storage_type: ceph
expr: |
up{job="rook-ceph-mgr"} == 0
label_replace((up{job="rook-ceph-mgr"} == 0 or absent(up{job="rook-ceph-mgr"})), "namespace", "openshift-storage", "", "")
for: 5m
labels:
severity: critical
Expand All @@ -53,7 +53,7 @@ spec:
severity_level: warning
storage_type: ceph
expr: |
sum(up{job="rook-ceph-mgr"}) by (namespace) < 1
sum(kube_deployment_spec_replicas{deployment=~"rook-ceph-mgr-.*"}) by (namespace) < 1
for: 5m
labels:
severity: warning
Expand Down

0 comments on commit cfa2c2d

Please sign in to comment.