Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule_files configuration for the prometheus CRD to make prometheus watch other directories except those are created by the prometheus operator. #6561

Open
wotkddl21 opened this issue May 3, 2024 · 5 comments

Comments

@wotkddl21
Copy link

Component(s)

Prometheus, PrometheusRule

What is missing? Please describe.

If the total amount of prometheusrules are over the limitation of k8s configmap (1Mib), then the operator create one more configmap and add additional volumes, volumeMounts configuration for the prometheus pod.

This leads the prometheus pod to be restarted. (changes under .spec leads for the pod to be restarted. )
As you know, as prometheus pod restarts the alerts are reset.
So now if I add some of prometheus rules, then it could lead all the alerts those are firing get reset.

My first work-around is mounting additional volumes where some of rulefiles those prometheus could comprehend are stored for the prometheus pod. (like /etc/prometheus/rules/custom-additional-rules)
But, the prometheus would watch only directory that the prometheus operator creates. (/etc/prometheus/rules/-rulefiles-)
The configuration of the prometheus is like below

global:  
  evaluation_interval: 15s  
  ...
rule_files:  
- /etc/prometheus/rules/<prometheus pod>-rulefiles-0/*.yaml  

According to above configuration, I have to mount rulefiles on the /etc/prometheus/rules/-rulefiles-0, but that is not allowed in the k8s volume (The mountPath should be unique).

My 2nd work-around is copying the rulefiles to the /etc/prometheus/rules/-rulefiles-0
The file system "/etc/prometheus/rules/-rulefiles-0" is mounted as read-only.
So I can't copying anything to there.

Describe alternatives you've considered.

My suggestion: How about adding the rule_files configuration to set custom rulefiles directory?

prometheus.yaml

...
rule_files:
- /etc/prometheus/rules/db/*.yaml
  /etc/prometheus/rules/frontend/*.yaml
  /etc/prometheus/rules/custom/*.yaml

then the prometheus.env.yaml would be renderd like below

global:  
  evaluation_interval: 15s  
  ...
rule_files:  
- /etc/prometheus/rules/prometheus-app-prometheus-rulefiles-0/*.yaml     # default
  /etc/prometheus/rules/db/*.yaml  #custom
  /etc/prometheus/rules/frontend/*.yaml   #custom
  /etc/prometheus/rules/custom/*.yaml   #custom

Environment Information.

Environment

Kubernetes Version: v1.20.7
Prometheus-Operator Version: v0.53.1

@wotkddl21 wotkddl21 added kind/feature needs-triage Issues that haven't been triaged yet labels May 3, 2024
@simonpasquier
Copy link
Contributor

As you know, as prometheus pod restarts the alerts are reset.
So now if I add some of prometheus rules, then it could lead all the alerts those are firing get reset.

It shouldn't be the case unless it takes a long time for Prometheus to restart.

Is your request related to #5085? I'd rather fix the annoyance of restarting the pods when a new rule configmap is mounted than tweaking the pod volumes manually.

@wotkddl21
Copy link
Author

wotkddl21 commented May 6, 2024

@simonpasquier I tried optional configmap. I set my prometheus like below.
prometheus.yaml

 volumes:  
  - configMap:  
     defaultMode: 420  
     name: prometheus-app-prometheus-rulefiles-1  
     optional: true  
   name: prometheus-app-prometheus-rulefiles-1-prevision  
 - configMap:  
     defaultMode: 420  
     name: prometheus-app-prometheus-rulefiles-2  
     optional: true  
   name: prometheus-app-prometheus-rulefiles-2-prevision  

And I added prometheusrules, then the operator editted the statefulset volumes like below.

     volumes:  
      - configMap:  
         defaultMode: 420  
         name: prometheus-app-prometheus-rulefiles-0  
       name: prometheus-app-prometheus-rulefiles-0  
     - configMap:  
         defaultMode: 420  
         name: prometheus-app-prometheus-rulefiles-1  
       name: prometheus-app-prometheus-rulefiles-1  
     - configMap:  
         defaultMode: 420  
         name: prometheus-app-prometheus-rulefiles-1  
         optional: true  
       name: prometheus-app-prometheus-rulefiles-1-prevision  
     - configMap:  
         defaultMode: 420  
         name: prometheus-app-prometheus-rulefiles-2  
         optional: true  
       name: prometheus-app-prometheus-rulefiles-2-prevision          

The prometheus pod restarted since the volumes were changed.

If I set the prometheus volumes like below not to change the volumes,
prometheus.yaml

 volumes:  
  - configMap:  
     defaultMode: 420  
     name: prometheus-app-prometheus-rulefiles-1  
     optional: true  
   name: prometheus-app-prometheus-rulefiles-1
 - configMap:  
     defaultMode: 420  
     name: prometheus-app-prometheus-rulefiles-2  
     optional: true  
   name: prometheus-app-prometheus-rulefiles-2

The k8s events says that
StatefulSet prometheus-app-prometheus failed error: Pod "prometheus-app-prometheus-0" is invalid: spec.volumes[7].name: Duplicate value: "prometheus-app-prometheus-rulefiles-1

@wotkddl21
Copy link
Author

I think the operator should check the volumes.items[*].
With your advice, if there are the volume.items[] that I mounted as the optional configmap , the operator should skip adding the rulefiles to the volumes. (merging the list of rulefiles to the volumes)

@simonpasquier
Copy link
Contributor

I don't agree, it would over-complicate the operator logic.

You also didn't answer to this:

Is your request related to #5085?

If yes then I'd rather fix #5085.

@wotkddl21
Copy link
Author

#5085 is about not deleting the configmaps which are provisioned by the operator.
My issue is about adding rule_files configuration to let users could handle their own rule files.
As the prometheus operator, I think, it could manage most of the configurations of the prometheus with CRDs.
In this perspective, rule_files field should be handled by the prometheus operator.

But if you fix #5085 then I could solve my issue too.

@simonpasquier simonpasquier removed the needs-triage Issues that haven't been triaged yet label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants