Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: increasing the auto-resolvable alerts' delay to 15m #8896

Merged
merged 1 commit into from Oct 1, 2021

Conversation

aruniiird
Copy link
Contributor

The following alerts,

CephMonHighNumberOfLeaderChanges
CephOSDDiskNotResponding
CephClusterWarningState

, which are resolved automatically, in most cases,
are causing unnecessary admin events. So we are increasing the
alert delay time to '15m'.

Signed-off-by: Arun Kumar Mohan amohan@redhat.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@mergify mergify bot added the ceph main ceph tag label Sep 30, 2021
@@ -150,7 +150,7 @@ spec:
storage_type: ceph
expr: |
label_replace((ceph_osd_in == 1 and ceph_osd_up == 0),"disk","$1","ceph_daemon","osd.(.*)") + on(ceph_daemon) group_left(host, device) label_replace(ceph_disk_occupation,"host","$1","exported_instance","(.*)")
for: 1m
for: 15m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing interval for a critical severity error like CephOSDDiskNotResponding from 1 minute to 15 minutes seems a bit ... risky!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aruniiird what's the rationale behind that change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is mainly for ODF Managed Service product , their SREs (their customer support team) are getting (action required) alerts, which are automatically resolved (either by ceph itself or through OCS Operator reconciliation). So they want to increase the alert time delay for the following alerts,

CephMonHighNumberOfLeaderChanges
CephOSDDiskNotResponding
CephClusterWarningState

As we don't have a separate alert mechanism for Managed Services , making the changes here.

@leseb
Copy link
Member

leseb commented Oct 1, 2021

@Mergifyio rebase

The following alerts,

CephMonHighNumberOfLeaderChanges
CephOSDDiskNotResponding
CephClusterWarningState

, which are resolved automatically, in most cases,
are causing unnecessary admin events. So we are increasing the
alert delay time to '15m'.

Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
@mergify
Copy link

mergify bot commented Oct 1, 2021

Command rebase: success

Branch has been successfully rebased

@leseb leseb merged commit 5f0cfb6 into rook:master Oct 1, 2021
travisn added a commit that referenced this pull request Oct 4, 2021
ceph: increasing the auto-resolvable alerts' delay to 15m (backport #8896)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants