Skip to content

Commit

Permalink
monitoring: customize prometheus rule alerts
Browse files Browse the repository at this point in the history
Signed-off-by: Yuval Manor <yuvalman958@gmail.com>
  • Loading branch information
yuvalman committed Jan 17, 2022
1 parent f324622 commit 4c8b793
Show file tree
Hide file tree
Showing 25 changed files with 977 additions and 408 deletions.
16 changes: 16 additions & 0 deletions Documentation/ceph-cluster-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ If this value is empty, each pod will get an ephemeral directory to store their
* `ssl`: Whether to serve the dashboard via SSL, ignored on Ceph versions older than `13.2.2`
* `monitoring`: Settings for monitoring Ceph using Prometheus. To enable monitoring on your cluster see the [monitoring guide](ceph-monitoring.md#prometheus-alerts).
* `enabled`: Whether to enable prometheus based monitoring for this cluster
* `alertRuleOverrides`: Custom prometheus rule alerts values to override default values
* `externalMgrEndpoints`: external cluster manager endpoints
* `externalMgrPrometheusPort`: external prometheus manager module port. See [external cluster configuration](#external-cluster) for more details.
* `rulesNamespace`: Namespace to deploy prometheusRule. If empty, namespace of the cluster will be used.
Expand Down Expand Up @@ -1390,6 +1391,21 @@ spec:
#externalMgrEndpoints:
#- ip: 192.168.39.182
#externalMgrPrometheusPort: 9283
# prometheus rule alerts values for overriding default prometheus rules values
# optional: specific alerts can be disabled by setting disabled field to true
#alertRuleOverrides:
# CephNodeDown:
# disabled: true
# CephMgrIsAbsent:
# for: 1m
# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error
# severity: custom-severity
# CephOSDNearFull:
# limit: 80
# for: 2m
# CephOSDFlapping:
# osdUpRate: 10m
# severity: custom-severity-2
```

Choose the namespace carefully, if you have an existing cluster managed by Rook, you have likely already injected `common.yaml`.
Expand Down
15 changes: 15 additions & 0 deletions Documentation/ceph-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,21 @@ spec:
monitoring:
enabled: true
rulesNamespace: "rook-ceph"
# prometheus rule alerts values for overriding default prometheus rules values
# optional: specific alerts can be disabled by setting disabled field to true
#alertRuleOverrides:
# CephNodeDown:
# disabled: true
# CephMgrIsAbsent:
# for: 1m
# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error
# severity: custom-severity
# CephOSDNearFull:
# limit: 80
# for: 2m
# CephOSDFlapping:
# osdUpRate: 10m
# severity: custom-severity-2
[...]
```

Expand Down
3 changes: 3 additions & 0 deletions PendingReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@


## Features

### Ceph
- Prometheus Rule alerts can be customized by user preference.
16 changes: 16 additions & 0 deletions deploy/charts/rook-ceph-cluster/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,22 @@ monitoring:
# enabling will also create RBAC rules to allow Operator to create ServiceMonitors
enabled: false
rulesNamespaceOverride:
# prometheus rule alerts values for overriding default prometheus rules values
# optional: specific alerts can be disabled by setting disabled field to true
#alertRuleOverrides:
# CephNodeDown:
# disabled: true
# CephMgrIsAbsent:
# for: 1m
# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error
# severity: custom-severity
# CephOSDNearFull:
# limit: 80
# for: 2m
# CephOSDFlapping:
# osdUpRate: 10m
# severity: custom-severity-2


# If true, create & use PSP resources. Set this to the same value as the rook-ceph chart.
pspEnable: true
Expand Down
26 changes: 26 additions & 0 deletions deploy/charts/rook-ceph/templates/resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1593,6 +1593,32 @@ spec:
description: Prometheus based Monitoring settings
nullable: true
properties:
alertRuleOverrides:
additionalProperties:
description: CephAlert basic customized alert
properties:
disabled:
type: boolean
for:
type: string
limit:
type: integer
namespace:
type: string
osdUpRate:
type: string
severity:
type: string
severityLevel:
enum:
- warning
- critical
- error
type: string
type: object
description: AlertRuleOverrides points to a customized Ceph prometheus alerts
nullable: true
type: object
enabled:
description: Enabled determines whether to create the prometheus rules for the ceph cluster. If true, the prometheus types must exist or the creation will fail.
type: boolean
Expand Down
15 changes: 15 additions & 0 deletions deploy/charts/rook-ceph/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -358,3 +358,18 @@ monitoring:
# requires Prometheus to be pre-installed
# enabling will also create RBAC rules to allow Operator to create ServiceMonitors
enabled: false
# prometheus rule alerts values for overriding default prometheus rules values
# optional: specific alerts can be disabled by setting disabled field to true
#alertRuleOverrides:
# CephNodeDown:
# disabled: true
# CephMgrIsAbsent:
# for: 1m
# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error
# severity: custom-severity
# CephOSDNearFull:
# limit: 80
# for: 2m
# CephOSDFlapping:
# osdUpRate: 10m
# severity: custom-severity-2
8 changes: 8 additions & 0 deletions deploy/examples/cluster-external.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,11 @@ spec:
# externalMgrEndpoints:
#- ip: ip
# externalMgrPrometheusPort: 9283
# prometheus rule alerts values for overriding default prometheus rules values
# optional: specific alerts can be disabled by setting disabled field to true
#alertRuleOverrides:
# PersistentVolumeUsageNearFull:
# limit: 80
# for: 1m
# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error
# severity: custom-severity
15 changes: 15 additions & 0 deletions deploy/examples/cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,21 @@ spec:
# If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
# deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
rulesNamespace: rook-ceph
# prometheus rule alerts values for overriding default prometheus rules values
# optional: specific alerts can be disabled by setting disabled field to true
#alertRuleOverrides:
# CephNodeDown:
# disabled: true
# CephMgrIsAbsent:
# for: 1m
# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error
# severity: custom-severity
# CephOSDNearFull:
# limit: 80
# for: 2m
# CephOSDFlapping:
# osdUpRate: 10m
# severity: custom-severity-2
network:
# enable host networking
#provider: host
Expand Down
26 changes: 26 additions & 0 deletions deploy/examples/crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1592,6 +1592,32 @@ spec:
description: Prometheus based Monitoring settings
nullable: true
properties:
alertRuleOverrides:
additionalProperties:
description: CephAlert basic customized alert
properties:
disabled:
type: boolean
for:
type: string
limit:
type: integer
namespace:
type: string
osdUpRate:
type: string
severity:
type: string
severityLevel:
enum:
- warning
- critical
- error
type: string
type: object
description: AlertRuleOverrides points to a customized Ceph prometheus alerts
nullable: true
type: object
enabled:
description: Enabled determines whether to create the prometheus rules for the ceph cluster. If true, the prometheus types must exist or the creation will fail.
type: boolean
Expand Down

0 comments on commit 4c8b793

Please sign in to comment.