New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter fails to schedule a pending pod with a preferred affinity #1204
Comments
Its not just a preferred node affinity that isn't satisfiable that is causing this. Its due to the label being for a restricted domain (
It may not be necessary to validate preferred terms however since they can be ignored. |
Would you mind linking the code for this special handling to me? I'm searching for any mention of "node-role" or "control-plane" label and not finding anything. I don't think it's safe to just ignore any |
/assign @jmdeal |
@wmgroot I believe what is happening is that the affinity causes Karpenter to compute a nodeclaim with the restricted domain as one of it's labels. Then, Karpenter validates the nodeclaim, detects this restricted label, and determines it's an unsatisfiable nodeclaim and can not be created. So Karpenter does not scale up a node. |
Description
Observed Behavior:
We have a pod stuck in pending indefinitely and Karpenter does not take action to add a new node to allow the pod to schedule.
The pod has a soft affinity to prefer controlplane nodes. Given this is an EKS cluster, this pod can never schedule on a controlplane node.
Expected Behavior:
Karpenter creates a node to allow the pod to schedule even though the pod has a soft affinity preference that cannot be satisfied. Not scheduling the pod can result in prolonged outages, blocked PDBs and other undesirable behavior that requires manual intervention and is worse than an unsatisfied soft affinity.
Reproduction Steps (Please include YAML):
I believe this should be reproducible with a pod that uses a nodeselector/toleration for an isolated NodePool for easier testing.
Upon removing the affinity spec from the example above, Karpenter added a node immediately to allow the pod to schedule.
Versions:
kubectl version
):The text was updated successfully, but these errors were encountered: