Karpenter fails to schedule a pending pod with a preferred affinity #1204

wmgroot · 2024-04-23T17:05:13Z

Description

Observed Behavior:
We have a pod stuck in pending indefinitely and Karpenter does not take action to add a new node to allow the pod to schedule.

$ kubectl get pod -n capa-system
NAME                                      READY   STATUS    RESTARTS   AGE
capa-controller-manager-7c6f4fbf6-2wxxr   0/1     Pending   0          4d5h
capa-controller-manager-7c6f4fbf6-9lsd6   1/1     Running   0          4d5h

The pod has a soft affinity to prefer controlplane nodes. Given this is an EKS cluster, this pod can never schedule on a controlplane node.

    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: Exists
            weight: 10

Expected Behavior:
Karpenter creates a node to allow the pod to schedule even though the pod has a soft affinity preference that cannot be satisfied. Not scheduling the pod can result in prolonged outages, blocked PDBs and other undesirable behavior that requires manual intervention and is worse than an unsatisfied soft affinity.

Reproduction Steps (Please include YAML):
I believe this should be reproducible with a pod that uses a nodeselector/toleration for an isolated NodePool for easier testing.

Any unsatisfiable preferred constraint in an affinity should allow the observed behavior to occur (such as a label that will never exist for nodes in the nodepool).
Do note that the pod must not have space to schedule without Karpenter taking action, otherwise K8s will schedule it successfully without satisfying the soft affinity constraint. Using a NodePool that should scale up from 0 is an effective way to test this.

Upon removing the affinity spec from the example above, Karpenter added a node immediately to allow the pod to schedule.

$ kubectl get pod -n capa-system
NAME                                      READY   STATUS    RESTARTS   AGE
capa-controller-manager-7c6f4fbf6-4wc6b   1/1     Running   0          12m
capa-controller-manager-7c6f4fbf6-hqtjb   1/1     Running   0          12m

Versions:

Chart Version: 0.35.0
Kubernetes Version (kubectl version):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.1", GitCommit:"e4d4e1ab7cf1bf15273ef97303551b279f0920a9", GitTreeState:"clean", BuildDate:"2022-09-14T19:49:27Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.14-eks-b9c9ed7", GitCommit:"7c3f2be51edd9fa5727b6ecc2c3fc3c578aa02ca", GitTreeState:"clean", BuildDate:"2024-03-02T03:46:35Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/amd64"}

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

tzneal · 2024-04-23T20:10:45Z

Its not just a preferred node affinity that isn't satisfiable that is causing this. Its due to the label being for a restricted domain (node-role.kubernetes.io/control-plane). If you modify the label to be something else, Karpenter will launch capacity for the pod.

karpenter-5bb56f6d9b-l8v4x controller {"level":"DEBUG","time":"2024-04-23T20:03:55.814Z","logger":"controller.disruption","message":"ignoring pod, label node-role.kubernetes.io/control-plane is restricted; specify a well known label: [karpenter.k8s.aws/instance-accelerator-count karpenter.k8s.aws/instance-accelerator-manufacturer karpenter.k8s.aws/instance-accelerator-name karpenter.k8s.aws/instance-category karpenter.k8s.aws/instance-cpu karpenter.k8s.aws/instance-encryption-in-transit-supported karpenter.k8s.aws/instance-family karpenter.k8s.aws/instance-generation karpenter.k8s.aws/instance-gpu-count karpenter.k8s.aws/instance-gpu-manufacturer karpenter.k8s.aws/instance-gpu-memory karpenter.k8s.aws/instance-gpu-name karpenter.k8s.aws/instance-hypervisor karpenter.k8s.aws/instance-local-nvme karpenter.k8s.aws/instance-memory karpenter.k8s.aws/instance-network-bandwidth karpenter.k8s.aws/instance-size karpenter.sh/capacity-type karpenter.sh/nodepool kubernetes.io/arch kubernetes.io/os node.kubernetes.io/instance-type node.kubernetes.io/windows-build topology.kubernetes.io/region topology.kubernetes.io/zone], or a custom label that does not use a restricted domain: [k8s.io karpenter.k8s.aws karpenter.sh kubernetes.io]","commit":"8b2d1d7","pod":"default/test-pod"}

It may not be necessary to validate preferred terms however since they can be ignored.

wmgroot · 2024-04-23T22:51:02Z

Would you mind linking the code for this special handling to me? I'm searching for any mention of "node-role" or "control-plane" label and not finding anything. I don't think it's safe to just ignore any kubernetes.io label since some of those are reasonable to use for node selection.

billrayburn · 2024-05-08T20:34:48Z

/assign @jmdeal

cnmcavoy · 2024-05-10T18:59:45Z

@wmgroot I believe what is happening is that the affinity causes Karpenter to compute a nodeclaim with the restricted domain as one of it's labels. Then, Karpenter validates the nodeclaim, detects this restricted label, and determines it's an unsatisfiable nodeclaim and can not be created. So Karpenter does not scale up a node.

wmgroot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 23, 2024

tzneal removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 23, 2024

k8s-ci-robot assigned jmdeal May 8, 2024

wmgroot mentioned this issue May 10, 2024

docs: update karpenter docs around soft affinity behavior aws/karpenter-provider-aws#6172

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter fails to schedule a pending pod with a preferred affinity #1204

Karpenter fails to schedule a pending pod with a preferred affinity #1204

wmgroot commented Apr 23, 2024

tzneal commented Apr 23, 2024

wmgroot commented Apr 23, 2024

billrayburn commented May 8, 2024

cnmcavoy commented May 10, 2024 •

edited

Karpenter fails to schedule a pending pod with a preferred affinity #1204

Karpenter fails to schedule a pending pod with a preferred affinity #1204

Comments

wmgroot commented Apr 23, 2024

Description

tzneal commented Apr 23, 2024

wmgroot commented Apr 23, 2024

billrayburn commented May 8, 2024

cnmcavoy commented May 10, 2024 • edited

cnmcavoy commented May 10, 2024 •

edited