Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize Ceph PrometheusRule CRD #9082

Closed
yuvalman opened this issue Nov 2, 2021 · 10 comments · Fixed by #9837
Closed

Customize Ceph PrometheusRule CRD #9082

yuvalman opened this issue Nov 2, 2021 · 10 comments · Fixed by #9837
Assignees
Labels
Projects

Comments

@yuvalman
Copy link
Contributor

yuvalman commented Nov 2, 2021

Is this a bug report or feature request?

  • Feature Request

What should the feature do:

Give the option to customize ceph PrometheusRule CRD.

There few options for achieving this feature, I'm suggesting here one of my ideas:

  1. We can add a new field under monitoring section in the CephCluster CRD called prometheusRules. Under this section, we can add sections per alert name, and under this section, every field that will be mentioned in the CRD will overwrite the default values that we have now (using Go templates).

For example:

  .
  .
  .
 monitoring:
     enabled: true
     rulesNamespace: rook-ceph
    prometheusRules:
      CephOSDNearFull:
        for: 5m
        limit: 0.80
 .
 .
 .

In this case the CephOSDNearFull alert will be fired after the limit 0.80 instead of 0.75. Also, it will be fired after 5 minutes instead of the default value.

This feature can help users that use Ceph in Rook to modify the prometheus alerts as they want.

What is use case behind this feature:

Users cannot modify the prometheusRule CRD as they want.

Environment:

@leseb
Copy link
Member

leseb commented Nov 2, 2021

@leseb
Copy link
Member

leseb commented Nov 2, 2021

@aruniiird thoughts?

@yuvalman
Copy link
Contributor Author

yuvalman commented Nov 2, 2021

Hi @leseb, these yaml files are hardcoded.
In the example I mentioned, we can see that the alert CephOSDNearFull has hardcoded limit of 0.75.

My suggestion will give the option to modify the CRD as the user want.

I'm suggesting making them as templates, and giving the users to decide regarding the values of the alert

@leseb
Copy link
Member

leseb commented Nov 2, 2021

Rook does not manage the PrometheusRule. It merely helps with the ServiceMonitor resource and nothing more. If you want to modify the value of the rules, you will need to edit the YAML file instead.

@travisn
Copy link
Member

travisn commented Nov 2, 2021

Agreed that we need to find a way to make it customizable. Rook does actually install the rule here. The hard coded settings have also been a concern as described in #9005.

@travisn travisn added this to To do in v1.7 via automation Nov 2, 2021
@travisn travisn removed this from To do in v1.7 Nov 2, 2021
@travisn travisn added this to To do in v1.8 via automation Nov 2, 2021
yuvalman added a commit to yuvalman/rook that referenced this issue Dec 28, 2021
Closes: rook#9082
Signed-off-by: Yuval Manor <yuvalman958@gmail.com>
yuvalman added a commit to yuvalman/rook that referenced this issue Dec 28, 2021
Closes: rook#9082
Signed-off-by: Yuval Manor <yuvalman958@gmail.com>
@github-actions
Copy link

github-actions bot commented Jan 1, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@BlaineEXE
Copy link
Member

BlaineEXE commented Jan 4, 2022

I'm not so sure the CephCluster CRD is the best place to set custom prometheus metrics for Rook. Maybe it was a design flaw initially that Rook deployed Prometheus rules. I think it's good for Rook to have recommended rules (as examples or docs) that go pair with Ceph, but I'm not sure it's Rook's place to have such a tight coupling between Ceph and Prometheus by actually deploying and managing the rules itself. Can we separate responsibilities better? What would it look like to transition to a deployment method that had the users set prometheus rules with prometheus tooling?

E.g., #8891

@github-actions github-actions bot removed the wontfix label Jan 4, 2022
@travisn travisn moved this from To do to In progress in v1.8 Jan 20, 2022
@travisn travisn removed this from In progress in v1.8 Feb 22, 2022
@travisn travisn added this to To do in v1.9 via automation Feb 22, 2022
@github-actions
Copy link

github-actions bot commented Mar 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@travisn travisn moved this from To do to In progress in v1.9 Mar 7, 2022
@travisn travisn self-assigned this Mar 7, 2022
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@travisn
Copy link
Member

travisn commented Mar 14, 2022

Expected to be customizable after #9837.

@travisn travisn reopened this Mar 14, 2022
@github-actions github-actions bot removed the wontfix label Mar 15, 2022
@leseb leseb moved this from In progress to Done in v1.9 Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
v1.9
Done
4 participants