Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CephPoolQuotaBytesNearExhaustion and CephPoolQuotaBytesCriticallyExhausted rules should use ceph_pool_stored #8735

Closed
zerkms opened this issue Sep 16, 2021 · 11 comments
Assignees
Labels

Comments

@zerkms
Copy link
Contributor

zerkms commented Sep 16, 2021

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

Currently the alert expression is:

(ceph_pool_stored_raw * on (pool_id) group_left(name)ceph_pool_metadata) / ((ceph_pool_quota_bytes * on (pool_id) group_left(name)ceph_pool_metadata) > 0) > 0.70

The problem is the the pool quotas alert expression is set for ceph_pool_stored_raw which includes replication factor, while the ceph quotes are set for logical data size (non-replicated).

Expected behavior:

It should use ceph_pool_stored, because that's what ceph takes into account when enforces the quota: only logical size, excluding replication factor.

How to reproduce it (minimal and precise):

Create a pool with a quota 100Mb.

Set replication factor 3.

Fill it by 50Mb.

See the rule is triggered, but ceph is healthy.

Now try to change the quota to 60Mb. See that ceph is still healthy again.

Now change it to 40Mb and see that only now ceph reports pool being over capacity.

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator's logs, if necessary
  • Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read Github documentation if you need help.

Environment:

  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod):
  • Storage backend version (e.g. for ceph do ceph -v):
  • Kubernetes version (use kubectl version):
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
@leseb
Copy link
Member

leseb commented Sep 17, 2021

@anmolsachan PTAL

@anmolsachan
Copy link
Contributor

CC @aruniiird @umangachapagain

@anmolsachan
Copy link
Contributor

@leseb Can you please assign this to @aruniiird . I will coordinate with him to fix it.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@zerkms
Copy link
Contributor Author

zerkms commented Nov 22, 2021

Uhm, can somebody please remove wontfix

@travisn travisn removed the wontfix label Nov 22, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@zerkms
Copy link
Contributor Author

zerkms commented Jan 24, 2022

Not stale.

@github-actions github-actions bot removed the wontfix label Jan 24, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@zerkms
Copy link
Contributor Author

zerkms commented Mar 27, 2022

Not stale.

@github-actions github-actions bot removed the wontfix label Mar 27, 2022
@BlaineEXE
Copy link
Member

I believe this is fixed/made irrelevant with #9837.

@zerkms
Copy link
Contributor Author

zerkms commented Mar 28, 2022

It indeed made it irrelevant, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants