Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: fixing the queries for alerts 'CephMgrIsAbsent' and 'CephMgrIsMissingReplicas' #8985

Conversation

aruniiird
Copy link
Contributor

CephMgrIsAbsent

This query initially had the following query

absent(up{job="rook-ceph-mgr"})

which will fire when the 'up' query is not present, but had two flows
a. it will not be fired if 'up' provides a result with ZERO value
b. it will not give any fields in the metric, so 'namespace' was missing

when the above query was replaced with the following,

up{job="rook-ceph-mgr"} == 0

query had the following shortage
a. whenever mgr pod is completely down (like 'replicas' set to ZERO
and 'mgr' is not coming up), 'up' query will not give any result.

Thus we had to combine both the queries to get results in both the scenarios.

CephMgrIsMissingReplicas

This query previously was,

sum(up{job="rook-ceph-mgr"}) < 1

had the same structure as the above (Absent) query, but it's
intention was to check the no: of 'replicas' count for ceph mgr.
Now it is changed to a kube query which handles the replicas count.

Signed-off-by: Arun Kumar Mohan amohan@redhat.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@mergify mergify bot added the ceph main ceph tag label Oct 15, 2021
…issingReplicas'

CephMgrIsAbsent
----------------
This alert initially had the following query

absent(up{job="rook-ceph-mgr"})

which will fire when the 'up' query is not present, but had two flows
  a. it will not be fired if 'up' provides a result with ZERO value
  b. it will not give any fields in the metric, so 'namespace' was missing

when the above query was replaced with the following,

up{job="rook-ceph-mgr"} == 0

query had the following shortage
  a. whenever mgr pod is completely down (like 'replicas' set to ZERO
and 'mgr' is not coming up), 'up' query will not give any result.

Thus we had to combine both the queries to get results in both the scenarios.

CephMgrIsMissingReplicas
------------------------
This query previously was,

sum(up{job="rook-ceph-mgr"}) < 1

had the same structure as the above (Absent) query, but it's
intention was to check the no: of 'replicas' count for ceph mgr.
Now it is changed to a kube query which handles the replicas count.

Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
@aruniiird aruniiird force-pushed the fix-CephMgrIsAbsent-n-CephMgrIsMissingReplicas-queries branch from 871be76 to cfa2c2d Compare October 15, 2021 07:46
@agarwal-mudit
Copy link
Contributor

/cc @leseb

@leseb leseb merged commit ecd75a2 into rook:master Oct 15, 2021
leseb added a commit that referenced this pull request Oct 18, 2021
ceph: fixing the queries for alerts 'CephMgrIsAbsent' and 'CephMgrIsMissingReplicas' (backport #8985)
@@ -42,7 +42,7 @@ spec:
severity_level: critical
storage_type: ceph
expr: |
up{job="rook-ceph-mgr"} == 0
label_replace((up{job="rook-ceph-mgr"} == 0 or absent(up{job="rook-ceph-mgr"})), "namespace", "openshift-storage", "", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aruniiird How can we avoid the specific namespace in upstream? The only example namespace we should have upstream is rook-ceph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked with #9005.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants