monitoring: customize prometheus rule alerts #9503

yuvalman · 2021-12-28T08:32:48Z

Description of your changes:

Make Prometeus Rule alerts customized by user preference

Signed-off-by: Yuval Manor yuvalman958@gmail.com

Which issue is resolved by this Pull Request:
Resolves #9082, #9005

Checklist:

yuvalman · 2021-12-30T06:49:35Z

With this pr we can easily add "labels" section for giving the option to override the prometheus rule labels as described in the resolved issue #8502

prometheusRule:
  labels:
    prometheus: custom-prometheus
    role: custom-alert-rules
  alerts:
    cephMgrIsAbsent:
      for: 1m
      namespace: custom-namespace
      severityLevel: rook-severityLevel
      severity: rook-severity

deploy/charts/rook-ceph/values.yaml

leseb · 2022-01-03T15:04:23Z

deploy/examples/monitoring/prometheusrule-default-values.yaml

@@ -0,0 +1,118 @@
+alerts:


What do we need this for?

We need this section if we would like to add a new section for overriding default values of prometheus rule in the future.
For example, adding "labels" section as described in the comment. Do you want to add this section in this pr?

If we are adding the "labels" section, we will add the default values in this file:

labels: prometheus: rook-prometheus role: alert-rules alerts: . . .

pkg/operator/ceph/cluster/mgr/mgr.go

leseb · 2022-01-03T15:12:28Z

deploy/charts/rook-ceph/templates/deployment.yaml

@@ -265,6 +265,10 @@ spec:
        - name: CSI_CEPHFS_PLUGIN_RESOURCE
          value: {{ .Values.csi.csiCephFSPluginResource | quote }}
 {{- end }}
+{{- if .Values.monitoring.prometheusRule }}
+        - name: ROOK_CEPH_MONITORING_PROMETHEUS_RULE


Where is set when not using Helm? operator.yaml needs some changes too.
Also, these rules will apply to all the cephcluster deployed right?

I added changes to operator.yaml. Should we also add changes to operator-openshift.yaml?

Yes, these rules will apply to all the cephclusters deployed, if we would like to apply different rules for different cephclusters, we should add a new field in cephCluster CRD instead of using env var in the operator (as suggested in #9082). Do you think it's a better solution for our case?

pkg/operator/ceph/cluster/mgr/spec.go

go.mod

travisn

Just a couple more minor points, then I think it's looking good to merge soon.
@leseb other feedback?

pkg/apis/ceph.rook.io/v1/types.go

travisn · 2022-01-17T21:56:00Z

Documentation/ceph-cluster-crd.md

+    #    disabled: true
+    #  CephMgrIsAbsent:
+    #    for: 1m
+    #    severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error


how about we just use one of the examples values here?

Suggested change

# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error

# severityLevel: warning # must be warning, critical, or error

Sure, make sense.

deploy/charts/rook-ceph/values.yaml

BlaineEXE · 2022-01-19T18:06:50Z

I'm really grateful that yuvalman is getting involved with the community here, and I think the quality of work and dedication has been spectacular. I don't want my disagreement about the Rook operator's role in deploying Prometheus rules to be a reflection on the good quality I see here.

Fundamentally, I don't think Rook should have ever been deploying Prometheus rules. It creates a tight integration with Prometheus and a precedent that means users may want Rook to be tightly integrated with other logging systems like Graylog, etc. This was also born of a Red Hat downstream need as evidenced by the hard-coded "openshift-storage" namespace on the rules, which I think was poor upstream citizenship at the time.

I looked at some high profile Kubernetes projects, and I didn't find documented evidence that they deploy Prometheus rules like Rook does. I link my findings below. I think it would be better for Rook to stop deploying Prometheus rules.

Envoy (not even sample rules afaict): https://www.getambassador.io/resources/ambassador-prometheus/
Contour (sample rules only): https://github.com/projectcontour/gimbal/blob/main/docs/monitoring.md
Istio (sample rules only): https://istio.io/latest/docs/ops/best-practices/observability/
- (no sample rules at all here; "merging" in this link is for setting up scraping and not rule merging) https://istio.io/latest/docs/ops/integrations/prometheus/
Cilium (config scraping w/ annotations but no rule creation): https://docs.cilium.io/en/v1.9/operations/metrics/

If we want the Rook project to be more helpful to users with respect to Prometheus and Ceph integration (which I think is a good benefit for users), then I think adding the benefit at a more business-logic level is more appropriate. I don't think that adding complexity to override/modify Prometheus rules in the Rook operator's application logic (Go code) via our main CRD is the best approach. The CRD-override pattern forces the Rook operator to handle the case where users wish to make modifications to the resources, adding complexity to the application logic (Go code) and also introducing a domain-specific method of overriding values which creates additional burden on users to understand how to configure the overrides.

I think it would be a better user experience to either (1) include recommended Prometheus rules in the rook-ceph-cluster Helm chart or (2) create a new helm chart with Rook-Ceph recommended Prometheus rules. If users wish to override or modify any of the rules, they can do so much more easily by using kustomize as a post-renderer, or they can render the helm charts into files to modify themselves (which they could also version with gitops). This still benefits users by having recommended Rook-Ceph Prometheus rules, but it does not experience the downsides to developers and users I mentioned above. Helm (with the option for a kustomize post-renderer) uses pre-existing methods that are well understood in the Kubernetes community, and the feature is provided in pure business logic.

I believe the Helm chart is the best place to provide this benefit to users if we wish to provide it. It creates the least burden on developers by implementing the feature without having to maintain application logic, and it provides the most transparent means of customization for users by utilizing well-understood Kubernetes community patterns.

leseb · 2022-01-21T08:49:44Z

I second what @BlaineEXE said about the @yuvalman's dedication to this PR. However, before jumping right away into the code/implementation it would have been great to start discussing this with a design doc instead like we almost always do (like explained here). Anyway, what is done is done I guess. Also, I have a part of responsibility too since I didn't mention this once that PR opened so sorry about this.

That said, onto the main subject. I believe PrometheusRule creation shouldn't have been in Rook from the get-go. It's not because we have this precedent that we have to keep building on top of it (especially if we agree it's not the right approach). Every company running on Kubernetes has probably a set of internal tools that do rules injection, so having the injection native to an operator is not so much needed. One of the reasons we believe it's useful is that we might have a too narrowed vision and a bias. People running Rook on Kubernetes most likely do not only run Rook but other software that might have their own rules. So I think users have their own automation and if they don't, this should be bundled via Helm.
The operator's responsibility is to deploy a Ceph cluster, that's what CephCluster gives you, and enlarging the reconcile loop for monitoring is not a good idea. We recently pushed back on adding Jagger tracing and decided to document it instead, we did the same for Grafana and Kibana CR IIRC.

As an action item, I'd like us to try removing the existing rule injection we have and manage this via Helm.

travisn · 2022-01-21T22:28:51Z

@BlaineEXE @leseb Is it fair to say that your perspective is that prometheus integration should not be the Rook operator's responsibility and that prometheus should not be treated as a first-class integration point? If we go with that philosophy, it has the following implications to remove all Prometheus integration from the CephCluster CR:

Remove the implementation for the CephCluster CR's monitoring settings:
- Remove creation of the PrometheusRule CR with all the Ceph rules. Create the rules instead with the helm chart and implement a different solution downstream where helm isn't used.
- Remove creation of the ServiceMonitor. The service monitor needs to target the active rook-ceph-mgr service, which may lose some flexibility despite the recent change in mgr: Update services when active mgr changes #9467 for ceph mgr failover. On a related note, see Enable customization of prometheus metrics labels created by rook operator #9618 for a request for improvement that the operator could simplify.
- Remove externalMgrEndpoints, and external clusters need to define their own external mgr configuration
- Remove externalMgrPrometheusPort

Alternatively, perhaps you are saying that only the PrometheusRule CRs should not be configurable through the cluster CR? In that case, this seems like a slippery slope... Why would we treat some prometheus artifacts such as ServiceMonitor as a first class resource, but not the PrometheusRule? Or if the goal is to ban any creation of Prometheus CRs including the ServiceMonitor, when why is the v1.Endpoints resource special for configuring the prometheus endpoint? It's a prometheus integration point even though it's a native k8s resource.

It is certainly an attractive idea for the helm chart to instead create the rules since it naturally has templating capabilities that would make it simple to insert custom values. But at the same time, it means that any non-helm deployments (e.g. downstream) now require a separate implementation. While helm would be very nice for this customization feature, requiring two separate implementations is generally a red flag.

Overall, I'd say my design perspectives are:

Rook should treat Prometheus as a first-class configuration point
Everything Ceph should be made available by Rook, including the Ceph prometheus rules.
If we can implement once and solve both upstream and downstream, why wouldn't we? It improves the developer experience to remove duplication, even if either approach works for upstream users.
IMO the changes in this PR are not difficult to maintain and the risk to the cluster controller is trivial.

Thanks for all the perspective and discussion, it's a lively topic!
“Time spent arguing is, oddly enough, almost never wasted.” - Christopher Hitchens

@yuvalman This has grown into a philosophical debate, but certainly look forward to your perspective as well on the impact these different approaches would have on you.

tareksha · 2022-01-22T23:02:03Z

I tend to agree with @BlaineEXE's approach. It is more practical for the customers to have direct interaction with the metrics that are exposed by ceph rather than going through an opaque layer dictated by rook CRs. Rook as a solution is excellent for provisioning ceph clusters into kubernetes and it is far more useful and flexible to expose ceph's capabilities directly and make them accessible.

The alerts themselves are usually very tailored to the specific business use case - thresholds, receivers, severity, on/off toggles... Such highly fluid declarations are more suitable to be templated via charts, surely less than the provisioning-focused CRs of rook.

I do think that providing ServiceMonitor via the CRs is somewhat useful as it is the de-facto common method for scraping metrics and can be thought as a part of provisioning Ceph Manager and exposing its capabilities. Not really a critical point IMHO.

yuvalman · 2022-01-24T07:23:30Z

I believe that the approach suggested by @BlaineEXE might improve the reconcile loop and make the templating easier for the upstream user. Also, it is right that it's not a common use in high-profile Kubernetes projects to deploy the monitoring solution by their operator.

In Rook-Ceph use case, I can see the benefit of using the operator for deploying all the monitoring resources that are related to the CephCluster, it makes a stronger integration with Ceph.
In addition, at any point, the upstream user can create his own prometheusRule and ServiceMonitor instead of the ceph monitoring solution provided by Rook.

I think that if only the upstream user is taken into account, we should go with @BlaineEXE approach.

When it comes to the downstream user, as @travisn mentioned, I believe this PR can solve both of the cases.

mergify · 2022-01-24T17:01:28Z

This pull request has merge conflicts that must be resolved before it can be merged. @yuvalman please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork

travisn · 2022-02-10T22:07:07Z

@BlaineEXE @leseb and I discussed through the different pros and cons of the approaches. The proposal is the following:

The customization feature is only needed for upstream.
Helm post processing tools will really simplify what is needed for the customization for upstream. Since only advanced users will need/want to customize it anyway, the helm approach really is better for upstream. Rook can provide examples/docs of the customization via helm.
The duplication of effort between helm for upstream and the downstream is really not much. Downstream will basically just need to pick up the existing functionality of rook where the PrometheusRule CR is created without customization.
Other prometheus configuration would remain as CephCluster CR settings since they are more about integration with the Ceph mgr, or other more dynamic settings that the operator needs to set.

This is proposed for v1.9.
@yuvalman Are you interested in pivoting the solution to use helm? Thanks for your understanding.

yuvalman · 2022-02-11T11:11:07Z

In case we decide to keep the monitoring solution as part of the reconcile loop, I want to make sure that you considered the following points:

Maintaining a helm template approach solution for upstream users for customization as part of rook-ceph-cluster chart, means that for new rules that will be added in the future, we will have to add them anyway to the helm template file, which will lead to the same effort of maintaining a template file in this pr.
So if the existing monitoring solution is kept in the reconcile loop (with all the consequences of creating the prometheusRule CR), I don't see any reason why not to use the solution provided by this pr, which will benefit the users, won't duplicate the solution, and won't enlarge the reconcile loop.
Maintaining only docs/examples will be much easier by simply adding the most up-to-date prometheusRule file itself in the docs, and let the upstream users handle the templating by themselves with helm (which have the downside of a weaker rook-ceph integration).
Maintaining 2 solutions (for upstream and downstream users) means that upstream users that want to customize the prometheus alerts by themselves and still use all the benefits of the existing operator's monitoring solution won't have the option to use the monitoring.enabled flag , because it will create for them the hard-coded prometheusRule CR as well, so if they want to maintain the prometheus rules by themselves, they will have to maintain the whole monitoring solution by themselves (which is not ideal). Unless we will add a flag only to prometheus CR creation, which might help in that case.
This PR also contains the fix for Custom namespaces should not be included in the prometheus alerts #9005, should we keep the replacement of dynamic values? We can use the same logic on the old prometheus-ceph-v14-rules.yaml file.

If we still decide to keep the monitoring solution in Rook operator's responsibility (which in that case, I believe there is a great value for the users to use the solution provided in this pr) and not using the customize feature proposed in this pr, my design perspectives are:

Add a recommended promethuesRule file as part of the docs, and not providing them a helm solution as part of the rook-ceph-cluster chart.
Add a specific flag for disabling prometheus cr creation.
Keep the fix for Custom namespaces should not be included in the prometheus alerts #9005.

Please let me know what do you think about the points I mentioned.

mergify · 2022-02-23T16:40:34Z

This pull request has merge conflicts that must be resolved before it can be merged. @yuvalman please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork

Signed-off-by: Yuval Manor <yuvalman958@gmail.com>

mergify · 2022-03-01T15:21:12Z

This pull request has merge conflicts that must be resolved before it can be merged. @yuvalman please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork

travisn · 2022-03-04T01:34:00Z

@yuvalman I've opened #9837 as a draft of the proposal, that the prometheus rules could be installed by the helm chart and customized with tools such as kustomize. The really nice outcome of that is that the rules were picked up directly from the ceph repo, and no changes were needed. I haven't reviewed the rules to see if that's completely true, but at least the rules can install in a test cluster with these changes.

This way, with all the customization at the post-processing with kustomize or similar tools, we really don't have to maintain any special processing in rook or even in the helm chart.

yuvalman · 2022-03-07T07:37:50Z

@travisn I agree.
I believe the proposal at #9837 will do the trick for post-processing customization, and the removal of the prometheus rules from the reconcile loop is better than the solution provided in this pr.

Regarding the rules from the ceph local repo, I can see that they changed the expression when checking if osd is full. That means that if a user would like to change the limit of this alert, he will have to customize the whole expression to the old one for being able to control the limit from Rook.

I believe that "OSD_NEARFULL" will now be able to be configured from ceph tools pod (but I didn't test it).

travisn · 2022-03-07T18:00:49Z

@travisn I agree. I believe the proposal at #9837 will do the trick for post-processing customization, and the removal of the prometheus rules from the reconcile loop is better than the solution provided in this pr.

Shall we close this PR then? Thanks for all the discussion and work on this PR, it really is appreciated!

Regarding the rules from the ceph local repo, I can see that they changed the expression when checking if osd is full. That means that if a user would like to change the limit of this alert, he will have to customize the whole expression to the old one for being able to control the limit from Rook.

I believe that "OSD_NEARFULL" will now be able to be configured from ceph tools pod (but I didn't test it).

Yes, I know the nearfull is customizable from the toolbox (or with the ceph.conf overrides), so it could be customized either way.

I do need to review the rule changes in more depth before getting this merged. I may end up separating the rule updates into a follow up PR to keep these changes only for the redesign...

yuvalman force-pushed the customise-ceph-alerts branch 25 times, most recently from 73bf466 to 035be1b Compare December 30, 2021 06:35

yuvalman changed the title ~~ceph: customize prometheus rule alerts~~ monitoring: customize prometheus rule alerts Dec 31, 2021

yuvalman force-pushed the customise-ceph-alerts branch 2 times, most recently from 5c8ed6c to 5ae4fcb Compare January 2, 2022 09:45

leseb requested changes Jan 3, 2022

View reviewed changes

travisn reviewed Jan 17, 2022

View reviewed changes

yuvalman force-pushed the customise-ceph-alerts branch from 4c8b793 to f2baaa8 Compare January 17, 2022 22:02

travisn reviewed Jan 17, 2022

View reviewed changes

deploy/charts/rook-ceph/values.yaml Show resolved Hide resolved

yuvalman force-pushed the customise-ceph-alerts branch from f2baaa8 to d7b688e Compare January 17, 2022 23:04

yuvalman requested a review from travisn January 17, 2022 23:06

yuvalman force-pushed the customise-ceph-alerts branch 2 times, most recently from d8e2216 to 1a8e84f Compare January 17, 2022 23:43

yuvalman force-pushed the customise-ceph-alerts branch 3 times, most recently from 6421eeb to f6aa14b Compare January 24, 2022 18:21

yuvalman force-pushed the customise-ceph-alerts branch from f6aa14b to 0efabf1 Compare February 27, 2022 08:01

monitoring: customize prometheus rule alerts

ca1ace4

Signed-off-by: Yuval Manor <yuvalman958@gmail.com>

yuvalman force-pushed the customise-ceph-alerts branch from 0efabf1 to ca1ace4 Compare February 27, 2022 11:22

travisn mentioned this pull request Mar 4, 2022

monitoring: Create prometheus rules with helm chart #9837

Merged

7 tasks

yuvalman closed this Mar 7, 2022

yuvalman deleted the customise-ceph-alerts branch March 7, 2022 18:23

travisn mentioned this pull request Apr 5, 2022

Discussion: Rook still creates resources that could be monitoring-implementation specific #9996

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring: customize prometheus rule alerts #9503

monitoring: customize prometheus rule alerts #9503

yuvalman commented Dec 28, 2021 •

edited

yuvalman commented Dec 30, 2021 •

edited

leseb Jan 3, 2022

yuvalman Jan 3, 2022

leseb Jan 3, 2022

yuvalman Jan 3, 2022 •

edited

travisn left a comment

travisn Jan 17, 2022

yuvalman Jan 17, 2022

BlaineEXE commented Jan 19, 2022 •

edited

leseb commented Jan 21, 2022

travisn commented Jan 21, 2022

tareksha commented Jan 22, 2022

yuvalman commented Jan 24, 2022 •

edited

mergify bot commented Jan 24, 2022

travisn commented Feb 10, 2022

yuvalman commented Feb 11, 2022 •

edited

mergify bot commented Feb 23, 2022

mergify bot commented Mar 1, 2022

travisn commented Mar 4, 2022

yuvalman commented Mar 7, 2022

travisn commented Mar 7, 2022

	# severityLevel: <custom-severityLevel> #<custom-severityLevel> must be one of the next levels: warning, critical, error
	# severityLevel: warning # must be warning, critical, or error

monitoring: customize prometheus rule alerts #9503

monitoring: customize prometheus rule alerts #9503

Conversation

yuvalman commented Dec 28, 2021 • edited

yuvalman commented Dec 30, 2021 • edited

leseb Jan 3, 2022

Choose a reason for hiding this comment

yuvalman Jan 3, 2022

Choose a reason for hiding this comment

leseb Jan 3, 2022

Choose a reason for hiding this comment

yuvalman Jan 3, 2022 • edited

Choose a reason for hiding this comment

travisn left a comment

Choose a reason for hiding this comment

travisn Jan 17, 2022

Choose a reason for hiding this comment

yuvalman Jan 17, 2022

Choose a reason for hiding this comment

BlaineEXE commented Jan 19, 2022 • edited

leseb commented Jan 21, 2022

travisn commented Jan 21, 2022

tareksha commented Jan 22, 2022

yuvalman commented Jan 24, 2022 • edited

mergify bot commented Jan 24, 2022

travisn commented Feb 10, 2022

yuvalman commented Feb 11, 2022 • edited

mergify bot commented Feb 23, 2022

mergify bot commented Mar 1, 2022

travisn commented Mar 4, 2022

yuvalman commented Mar 7, 2022

travisn commented Mar 7, 2022

yuvalman commented Dec 28, 2021 •

edited

yuvalman commented Dec 30, 2021 •

edited

yuvalman Jan 3, 2022 •

edited

BlaineEXE commented Jan 19, 2022 •

edited

yuvalman commented Jan 24, 2022 •

edited

yuvalman commented Feb 11, 2022 •

edited