monitoring: Create prometheus rules with helm chart #9837

travisn · 2022-03-04T01:27:43Z

Description of your changes:
The prometheus rules had been previously created if the cephcluster CR setting monitoring.enabled was set to true. The rules were not customizable and therefore not flexible enough. Now the rules are installed by the helm chart. To customize the rules, a post-processor can be applied to the helm chart.

This is proposed to replace the approach in #9503.

The rules can then be customized with a helm post-processor using tools such as kustomize. For example,

Extract the helm chart to a yaml:

helm template -f values.yaml rook-release/rook-ceph-cluster > cluster-chart.yaml

Run kustomize to update the desired prometheus rules:

kustomize build . > updated-chart.yaml
kubectl create -f updated-chart.yaml

There are many possible configurations for kustomize, but here is one example for the yamls would need to exist in the current directory:

kustimization.yaml

patches:
- path: severity.yaml
  target:
    group: monitoring.coreos.com
    kind: PrometheusRule
    name: prometheus-ceph-rules
    version: v1
resources:
- cluster-chart.yaml

severity.yaml: In this example, update the labels and the "for" statement in the first rule

- op: add
  path: /spec/groups/0/rules/0/labels
  value:
    my-label: foo
    severity: none
- op: add
  path: /spec/groups/0/rules/0/for
  value: 15m

Which issue is resolved by this Pull Request:
Resolves #9082, #9005

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

travisn · 2022-03-04T01:28:38Z

deploy/charts/rook-ceph-cluster/prometheus/localrules.yaml

@@ -0,0 +1,890 @@
+groups:


This yaml is directly copied from the ceph source repo. This way, we (hopefully) don't have to maintain any customizations to what is picked up from the ceph repo.

yuvalman · 2022-03-07T07:38:49Z

lgtm!

mergify · 2022-03-21T19:59:03Z

This pull request has merge conflicts that must be resolved before it can be merged. @travisn please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork

The prometheus rules had been previously created if the cephcluster CR setting monitoring.enabled was set to true. The rules were not customizable and therefore not flexible enough. Now the rules are installed by the helm chart. To customize the rules, a post-processor can be applied to the helm chart. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

Pick up the latest ceph prometheus rules from the ceph repo found at https://github.com/ceph/ceph/blob/master/monitoring/ceph-mixin/prometheus_alerts.yml. The updates include many new rules for monitoring of ceph. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

Rook has stopped creating the prometheus rules with the cephcluster monitoring.enabled setting. Now the rules must be created separately from the cluster CR as described in the rook PR rook/rook#9837. The rules are fully owned downstream by the ocs operator now since upstream they are only installed by the helm chart. This also gives full flexibility downstream to update the rules only when QE determines we are ready for testing all the new rules. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

travisn commented Mar 4, 2022

View reviewed changes

This was referenced Mar 4, 2022

monitoring: customize prometheus rule alerts #9503

Closed

prometheus: Spell check the alert descriptions ceph/ceph#45254

Merged

travisn force-pushed the helm-prometheus-rules branch 2 times, most recently from f68bfb3 to 9469a77 Compare March 9, 2022 00:39

travisn mentioned this pull request Mar 10, 2022

Allow disabling of PersistentVolumeUsageNearFull/PersistentVolumeUsageCritical alerts on workloads that are expected to be fully utilized #9568

Closed

travisn force-pushed the helm-prometheus-rules branch from 9469a77 to f285bdf Compare March 10, 2022 22:57

travisn mentioned this pull request Mar 14, 2022

Customize Ceph PrometheusRule CRD #9082

Closed

travisn force-pushed the helm-prometheus-rules branch from f285bdf to b14a7c0 Compare March 16, 2022 21:40

BlaineEXE approved these changes Mar 17, 2022

View reviewed changes

travisn added 2 commits March 21, 2022 14:47

travisn force-pushed the helm-prometheus-rules branch from b14a7c0 to 8dd4b77 Compare March 21, 2022 20:56

travisn merged commit 8e3350a into rook:master Mar 21, 2022

travisn deleted the helm-prometheus-rules branch March 21, 2022 22:59

travisn mentioned this pull request Mar 21, 2022

Custom namespaces should not be included in the prometheus alerts #9005

Closed

This was referenced Mar 28, 2022

Enable customization of prometheus metrics labels created by rook operator #9618

Closed

CephPoolQuotaBytesNearExhaustion and CephPoolQuotaBytesCriticallyExhausted rules should use ceph_pool_stored #8735

Closed

travisn mentioned this pull request Apr 4, 2022

Ceph prometheus rules created by OCS operator instead of Rook red-hat-storage/ocs-operator#1615

Merged

BlaineEXE mentioned this pull request Apr 5, 2022

Discussion: Rook still creates resources that could be monitoring-implementation specific #9996

Closed

travisn mentioned this pull request May 9, 2023

docs: update ruleNamesapce to rulesNamespaceOverride #12190

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring: Create prometheus rules with helm chart #9837

monitoring: Create prometheus rules with helm chart #9837

travisn commented Mar 4, 2022 •

edited

travisn Mar 4, 2022

yuvalman commented Mar 7, 2022

mergify bot commented Mar 21, 2022

monitoring: Create prometheus rules with helm chart #9837

monitoring: Create prometheus rules with helm chart #9837

Conversation

travisn commented Mar 4, 2022 • edited

travisn Mar 4, 2022

Choose a reason for hiding this comment

yuvalman commented Mar 7, 2022

mergify bot commented Mar 21, 2022

travisn commented Mar 4, 2022 •

edited