Allow disabling of PersistentVolumeUsageNearFull/PersistentVolumeUsageCritical alerts on workloads that are expected to be fully utilized #9568

akalenyu · 2022-01-12T15:27:22Z

Is this a bug report or feature request?

Feature Request

What should the feature do:
Allow components to add a label to a PVC that prevents PersistentVolumeUsageNearFull/PersistentVolumeUsageCritical alerts from firing.
(Similar to openshift/cluster-monitoring-operator#1493, can/should use the same key/value pair?)

What is use case behind this feature:
The use case for this is that some workloads (kubevirt/CDI) request
a PV that will by default be the exact size of the file (disk image),
causing the alerts to fire, when the reality is, that the size
of the data will never grow and the alert is obsolete.

Environment:

Clusters running kubevirt, but more use cases where PVC is full by design may exist

The text was updated successfully, but these errors were encountered:

parth-gr · 2022-03-10T15:12:42Z

I didn't understand the point,
If we need to grow the cluster in future we need to add more disk PVCs which will alow data to balance out and the alert will gone,
Or do you say we won't need to add any more data?

travisn · 2022-03-10T18:17:55Z

Work is in progress with #9837 that will allow the prometheus rules to be customizable with a helm post processor. @akalenyu Please take a look to confirm this will be covered.

akalenyu · 2022-03-15T16:59:32Z

I didn't understand the point, If we need to grow the cluster in future we need to add more disk PVCs which will alow data to balance out and the alert will gone, Or do you say we won't need to add any more data?

The idea is that we give an escape hatch for workloads that are expected to take up the entire PVC by design,
so a critical alert won't pop up for them.

We would then annotate our PVCs to exclude them from being able to trigger the alert in https://github.com/kubevirt/containerized-data-importer.
Is something similar to openshift/cluster-monitoring-operator#1493 not a reasonable way to go about this?

BlaineEXE · 2022-03-15T22:11:26Z

@akalenyu did you miss Travis's comment here? #9568 (comment)

I believe this may alleviate the issue. (You could edit the rules to have them ignore PVCs containing label of your choosing)

akalenyu · 2022-03-16T09:51:59Z

@akalenyu did you miss Travis's comment here? #9568 (comment)

I believe this may alleviate the issue. (You could edit the rules to have them ignore PVCs containing label of your choosing)

Sorry I should have been clearer about this; we can't really edit ceph rules from our project (containerized-data-importer), we're looking to handle this just by annotating the objects we are managing (PVCs). That is why the OpenShift monitoring approach worked for us.

BlaineEXE · 2022-03-16T18:47:05Z

we can't really edit ceph rules from our project

Do I take this to mean that you are not the admin of your kubernetes cluster?

How is Rook being installed in your clusters? Rook will no longer be deploying Ceph prometheus rules with Travis's PR #9837. After this, users will have to deploy the rules themselves manually or via Helm.

akalenyu · 2022-03-17T10:20:58Z

we can't really edit ceph rules from our project

Do I take this to mean that you are not the admin of your kubernetes cluster?

How is Rook being installed in your clusters? Rook will no longer be deploying Ceph prometheus rules with Travis's PR #9837. After this, users will have to deploy the rules themselves manually or via Helm.

I am not an admin of a particular cluster, no.
I am working on a project which is basically a Kubernetes controller that offers abstraction over PVCs, so the only way for us to silence this for our PVCs (which are expected to take up all space) programmatically is by labeling them upfront.

We don't install rook ourselves as part of the project, I noticed this alert on one of the clusters I was debugging
(which had OCS and our project installed on it).

#9837 might solve the issue, thank you - but I have a feeling that at some point OCS will decide to deploy this alerting rule automatically bringing us back to having this alert firing even though we expect the workloads to be nearly 100% utilized.

travisn · 2022-03-17T19:58:53Z

Openshift has a proposal for alert customization that would benefit OCS. Until then if you don't have control of the PrometheusRule CRs created by OCS/Rook, not sure how you can suppress these.

github-actions · 2022-05-16T20:12:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions · 2022-05-24T20:02:10Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

akalenyu added the feature label Jan 12, 2022

github-actions bot added the wontfix label May 16, 2022

github-actions bot closed this as completed May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow disabling of PersistentVolumeUsageNearFull/PersistentVolumeUsageCritical alerts on workloads that are expected to be fully utilized #9568

Allow disabling of PersistentVolumeUsageNearFull/PersistentVolumeUsageCritical alerts on workloads that are expected to be fully utilized #9568

akalenyu commented Jan 12, 2022 •

edited

parth-gr commented Mar 10, 2022

travisn commented Mar 10, 2022

akalenyu commented Mar 15, 2022

BlaineEXE commented Mar 15, 2022 •

edited

akalenyu commented Mar 16, 2022 •

edited

BlaineEXE commented Mar 16, 2022

akalenyu commented Mar 17, 2022 •

edited

travisn commented Mar 17, 2022

github-actions bot commented May 16, 2022

github-actions bot commented May 24, 2022

Allow disabling of PersistentVolumeUsageNearFull/PersistentVolumeUsageCritical alerts on workloads that are expected to be fully utilized #9568

Allow disabling of PersistentVolumeUsageNearFull/PersistentVolumeUsageCritical alerts on workloads that are expected to be fully utilized #9568

Comments

akalenyu commented Jan 12, 2022 • edited

parth-gr commented Mar 10, 2022

travisn commented Mar 10, 2022

akalenyu commented Mar 15, 2022

BlaineEXE commented Mar 15, 2022 • edited

akalenyu commented Mar 16, 2022 • edited

BlaineEXE commented Mar 16, 2022

akalenyu commented Mar 17, 2022 • edited

travisn commented Mar 17, 2022

github-actions bot commented May 16, 2022

github-actions bot commented May 24, 2022

akalenyu commented Jan 12, 2022 •

edited

BlaineEXE commented Mar 15, 2022 •

edited

akalenyu commented Mar 16, 2022 •

edited

akalenyu commented Mar 17, 2022 •

edited