Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Pod not healthy rules is not precise #217

Open
yiyu0x opened this issue May 6, 2021 · 1 comment
Open

Kubernetes Pod not healthy rules is not precise #217

yiyu0x opened this issue May 6, 2021 · 1 comment

Comments

@yiyu0x
Copy link

yiyu0x commented May 6, 2021

In section 5.1.17. Kubernetes Pod not healthy of this page, the description Pod has been in a non-ready state for longer than 15 minutes. and it below rule is:

expr: min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0

But, I think the correct rule is:

expr: sum_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) == 15
@Iskaldr
Copy link

Iskaldr commented Feb 14, 2022

I agree.
The current rule unfortunately also fires when a freshly (re-)deployed pod takes longer than 1 min to get ready, because the subquery [15m:1m] then only contains one bucket for that one minute with value = 1 triggering the min_over_time.

The proposed rule ensures, that the pod has been existing for 15 minutes and prevents the rule to pre-fire.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants