Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter does not disrupt #6086

Open
afreyermuth98 opened this issue Apr 22, 2024 · 14 comments
Open

Karpenter does not disrupt #6086

afreyermuth98 opened this issue Apr 22, 2024 · 14 comments
Assignees
Labels
question Further information is requested

Comments

@afreyermuth98
Copy link
Contributor

Description

Observed Behavior:
I have an under utilized node provisioned by karpenter and it never gets disrupted
Note: I have spotToSpotConsolidation enabled

image

Expected Behavior:
I expect this node to be disrupted

Reproduction Steps (Please include YAML):
Provision a node to 60% for example. Then scale down the resources / pods to get to about 10% and try to see the node getting disrupted

Versions:

  • Chart Version: 0/36.0
  • Kubernetes Version (kubectl version): 1.27.11
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@afreyermuth98 afreyermuth98 added bug Something isn't working needs-triage Issues that need to be triaged labels Apr 22, 2024
@tzneal
Copy link
Contributor

tzneal commented Apr 23, 2024

Can you show the events from kubectl describe node the-node-name?

@haim-bp
Copy link

haim-bp commented Apr 24, 2024

I have the same issue and I see this event
Normal Unconsolidatable 58m (x79 over 3d2h) karpenter Can't remove without creating 4 candidates

@tzneal
Copy link
Contributor

tzneal commented Apr 24, 2024

Can you check the pods on the node? Our scheduling simulation thinks it would need more nodes if it were removed, so there is likely a preferred topology spread or anti-affinity on the pods on the node.

@engedaam
Copy link
Contributor

@haim-bp any updates here?

@engedaam engedaam added question Further information is requested and removed bug Something isn't working needs-triage Issues that need to be triaged labels Apr 30, 2024
@e-koma
Copy link

e-koma commented May 17, 2024

I encountered a similar Issue.

Karpenter Disruption Issue

I have been testing Karpenter and encountered an issue that disruption suddenly do not work.
The consolidationPolicy is set to WhenEmpty

Situation

  • we are running k8s Jobs and "karpenter.sh/do-not-disrupt": "true" annotation is used for all pods.
  • first, Karpenter scales out and disrupts nodes without any issues.
  • However, after some time, disruption stops working.
    • After nodes are marked as DisruptionBlocked, some nodes seem to be permanently excluded from disruption.
    • Even with LOG_LEVEL set to DEBUG, no logs from Karpenter or Kubernetes events indicate any disruption activity.
    • of course all pods are stopped.

The nodepool is as follows:

spec:
  template:
    spec:
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["m6a.4xlarge"]
          minValues: 1
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
  disruption:
    budgets:
      - nodes: 20%
      - schedule: "0,30 * * * *"
        duration: 5m
        nodes: "0"
  limits:
    cpu: "80000"
    memory: 320Gi
  • Disruption budgets are set to 20%.
  • Every 0 and 30 minutes are blocked for 5 minutes

For instance, after disrupt from 5 to 4 to 3 to 2 nodes, the remaining 2 nodes do not get disrupted.
After the message Cannot disrupt NodeClaim: Nominated for a pending pod is logged, it stops disruption from 2 nodes.
Subsequent pod scheduling causes scale out, but the previously undisrupted 2 nodes remain undisrupted indefinitely.
After a node is marked as DisruptionBlocked, some nodes seem to be permanently excluded from disruption.

@e-koma
Copy link

e-koma commented May 17, 2024

Events

The disruption events are as follows.
In this case, node-zl8z4 is no longer disrupted.

$ kubectl get events --watch | grep Disruption

24m         Normal    DisruptionTerminating     nodeclaim/node-596dw                                                  Disrupting NodeClaim: Emptiness/Delete
18m         Normal    DisruptionBlocked         nodeclaim/node-7584b                                                  Cannot disrupt NodeClaim: Nominated for a pending pod
12m         Normal    DisruptionTerminating     nodeclaim/node-7584b                                                  Disrupting NodeClaim: Emptiness/Delete
33m         Normal    DisruptionBlocked         nodeclaim/node-8wkmq                                                  Cannot disrupt NodeClaim: Nominated for a pending pod
32m         Normal    DisruptionTerminating     nodeclaim/node-8wkmq                                                  Disrupting NodeClaim: Emptiness/Delete
32m         Normal    DisruptionTerminating     nodeclaim/node-c5pww                                                  Disrupting NodeClaim: Emptiness/Delete
18m         Normal    DisruptionBlocked         nodeclaim/node-h4x7v                                                  Cannot disrupt NodeClaim: Nominated for a pending pod
12m         Normal    DisruptionTerminating     nodeclaim/node-h4x7v                                                  Disrupting NodeClaim: Emptiness/Delete
18m         Normal    DisruptionBlocked         nodeclaim/node-kcs69                                                  Cannot disrupt NodeClaim: Nominated for a pending pod
12m         Normal    DisruptionTerminating     nodeclaim/node-kcs69                                                  Disrupting NodeClaim: Emptiness/Delete
32m         Normal    DisruptionTerminating     nodeclaim/node-kp4s5                                                  Disrupting NodeClaim: Emptiness/Delete
25m         Normal    DisruptionTerminating     nodeclaim/node-nmhcw                                                  Disrupting NodeClaim: Emptiness/Delete
33m         Normal    DisruptionBlocked         nodeclaim/node-splwm                                                  Cannot disrupt NodeClaim: Pod "default/**********-zfnjx" has "karpenter.sh/do-not-disrupt" annotation
32m         Normal    DisruptionTerminating     nodeclaim/node-splwm                                                  Disrupting NodeClaim: Emptiness/Delete
18m         Normal    DisruptionBlocked         nodeclaim/node-zl8z4                                                  Cannot disrupt NodeClaim: Nominated for a pending pod
12m         Normal    DisruptionBlocked         nodepool/node

NodeClaim

The NodeClaim and Node that are no longer being disrupted are as follows.

NodeClaim

apiVersion: karpenter.sh/v1beta1
kind: NodeClaim
metadata:
  annotations:
    karpenter.k8s.aws/ec2nodeclass-hash: "4618831760887766303"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v2
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/nodepool-hash: "16961008295110681836"
    karpenter.sh/nodepool-hash-version: v2
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"karpenter.k8s.aws/v1beta1","kind":"EC2NodeClass","metadata":{"annotations":{"kubernetes.io/description":"EC2NodeClass for worker node with custom userdata"},"name":"node"},"spec":{"amiFamily":"AL2","amiSelectorTerms":[{"id":"ami-0594c768bd89780c7"}],"associatePublicIPAddress":false,"blockDeviceMappings":[{"deviceName":"/dev/xvda","ebs":{"deleteOnTermination":true,"iops":3000,"throughput":125,"volumeSize":"100Gi","volumeType":"gp3"}},{"deviceName":"/dev/xvdba","ebs":{"deleteOnTermination":true,"volumeSize":"2000Gi","volumeType":"st1"}}],"instanceProfile":"台-eks-cluster-staging","metadataOptions":{"httpEndpoint":"enabled","httpPutResponseHopLimit":2,"httpTokens":"required"},"securityGroupSelectorTerms":[{"tags":{"karpenter.sh/discovery":"my-project-eks-cluster-staging-node"}}],"subnetSelectorTerms":[{"tags":{"karpenter.sh/discovery":"my-project-eks-cluster-staging"}}],"tags":{"Name":"my-project-eks-cluster-staging-1_28"},"userData":"export AWS_MAX_ATTEMPTS=6\nNODE_TYPE=node\nS3_BUCKET_NAME=my-project-eks-configs-staging\nSLACK_WEBHOOK_URL_SSM_KEY=\"/my-project_ops/SLACK_WEBHOOK_URL/my-project_alerts\"\n\npost_slack () {\n  local message=\"$1\"\n  SLACK_WEBHOOK_URL=$(aws ssm get-parameter --name \"$SLACK_WEBHOOK_URL_SSM_KEY\" --with-decryption | jq -r '.Parameter.Value')\n  TOKEN=$(curl -s -X PUT -H \"X-aws-ec2-metadata-token-ttl-seconds: 300\" \"http://169.254.169.254/latest/api/token\")\n  INSTANCE_ID=$(curl -s -H \"X-aws-ec2-metadata-token: $TOKEN\
" \"http://169.254.169.254/latest/meta-data/instance-id\")\n  curl -X POST -H 'Content-type: application/json' -d \"{\\\"text\\\":\\\"
$INSTANCE_ID: $message\\\"}\" ${SLACK_WEBHOOK_URL}\n  echo \"$message\"\n}\n\naws s3 cp s3://\"$S3_BUCKET_NAME\"/userdata/\"$NODE_TYPE
\"-enc.sh /var/lib/cloud/\nif [ $? -ne 0 ]; then\n  post_slack \"Error: Failed to download user_data (s3://${S3_BUCKET_NAME}/userdata/
${NODE_TYPE}-enc.sh)\"\n  exit 1\nfi\nbase64 -d /var/lib/cloud/\"$NODE_TYPE\"-enc.sh \u003e /var/lib/cloud/userdata-\"$NODE_TYPE\".sh\
nchmod 755 /var/lib/cloud/userdata-\"$NODE_TYPE\".sh\n/var/lib/cloud/userdata-\"$NODE_TYPE\".sh\n"}}
    kubernetes.io/description: EC2NodeClass for worker node with custom userdata
  creationTimestamp: "2024-05-17T04:25:28Z"
  finalizers:
  - karpenter.sh/termination
  generateName: node-
  generation: 1
  labels:
    karpenter.k8s.aws/instance-cpu: "16"
    karpenter.k8s.aws/instance-cpu-manufacturer: amd
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "true"
    karpenter.k8s.aws/instance-family: m6a
    karpenter.k8s.aws/instance-generation: "6"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "65536"
    karpenter.k8s.aws/instance-network-bandwidth: "6250"
    karpenter.k8s.aws/instance-size: 4xlarge
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/nodepool: node
    kubernetes.io/arch: amd64
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: m6a.4xlarge
    topology.kubernetes.io/region: ap-northeast-1
    topology.kubernetes.io/zone: ap-northeast-1a
    my-project.io/node: "true"
  name: node-zl8z4
  ownerReferences:
  - apiVersion: karpenter.sh/v1beta1
    blockOwnerDeletion: true
    kind: NodePool
    name: node
  resourceVersion: "110875561"
  uid: b618e73b-4f6d-45a0-89a4-4061eeae1131
spec:
  nodeClassRef:
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    name: node
  requirements:
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - node
  - key: node.kubernetes.io/instance-type
    minValues: 1
    operator: In
    values:
    - m6a.4xlarge
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: my-project.io/node
    operator: In
    values:
    - "true"
  resources:
    requests:
      memory: "15799001856"
      pods: "16"
status:
  allocatable:
    cpu: 15890m
    ephemeral-storage: 89Gi
    memory: 57691Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "54"
  capacity:
    cpu: "16"
    ephemeral-storage: 100Gi
    memory: 60620Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "54"
0s          Normal    RemovingNode              node/ip-10-80-97-82.ap-northeast-1.compute.internal                   Node ip-10-80-97-82.ap-northeast-1.compute.internal event: Removing Node ip-10-80-97-82.ap-northeast-1.compute.internal from Controller

0s          Normal    Completed                 job/my-project-transfer-job-health-detector-28598705                      Job completed
0s          Normal    SuccessfulDelete          cronjob/my-project-transfer-job-health-detector                           Deleted job my-project-transfer-job-health-detector-28598695
  uid: b618e73b-4f6d-45a0-89a4-4061eeae1131
spec:
  nodeClassRef:
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    name: node
  requirements:
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - node
  - key: node.kubernetes.io/instance-type
    minValues: 1
    operator: In
    values:
    - m6a.4xlarge
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: my-project.io/node
    operator: In
    values:
    - "true"
  resources:
    requests:
      cpu: 14935m
      memory: "15799001856"
      pods: "16"
status:
  allocatable:
    cpu: 15890m
    ephemeral-storage: 89Gi
    memory: 57691Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "54"
  capacity:
    cpu: "16"
    ephemeral-storage: 100Gi
    memory: 60620Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "54"
  conditions:
  - lastTransitionTime: "2024-05-17T04:28:46Z"
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-05-17T04:25:31Z"
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-05-17T04:28:46Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-05-17T04:27:53Z"
    status: "True"
    type: Registered
  imageID: ami-0594c768bd89780c7
  nodeName: ip-10-80-87-240.ap-northeast-1.compute.internal
  providerID: aws:///ap-northeast-1a/i-0011a40e85bd3e97a

Node

Node

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.80.87.240
    csi.volume.kubernetes.io/nodeid: '{"csi.tigera.io":"ip-10-80-87-240.ap-northeast-1.compute.internal","ebs.csi.aws.com":"i-0011a40e85bd3e97a"}'
    karpenter.k8s.aws/ec2nodeclass-hash: "4618831760887766303"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v2
    karpenter.sh/nodepool-hash: "16961008295110681836"
    karpenter.sh/nodepool-hash-version: v2
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"karpenter.k8s.aws/v1beta1","kind":"EC2NodeClass","metadata":{"annotations":{"kubernetes.io/description":"EC2NodeClass for worker node with custom userdata"},"name":"node"},"spec":{"amiFamily":"AL2","amiSelectorTerms":[{"id":"ami-0594c768bd89780c7"}],"associatePublicIPAddress":false,"blockDeviceMappings":[{"deviceName":"/dev/xvda","ebs":{"deleteOnTermination":true,"iops":3000,"throughput":125,"volumeSize":"100Gi","volumeType":"gp3"}},{"deviceName":"/dev/xvdba","ebs":{"deleteOnTermination":true,"volumeSize":"2000Gi","volumeType":"st1"}}],"instanceProfile":"my-project-eks-cluster-staging","metadataOptions":{"httpEndpoint":"enabled","httpPutResponseHopLimit":2,"httpTokens":"required"},"securityGroupSelectorTerms":[{"tags":{"karpenter.sh/discovery":"my-project-eks-cluster-staging-node"}}],"subnetSelectorTerms":[{"tags":{"karpenter.sh/discovery":"my-project-eks-cluster-staging"}}],"tags":{"Name":"my-project-eks-cluster-staging-1_28"},"userData":"export AWS_MAX_ATTEMPTS=6\nNODE_TYPE=node\nS3_BUCKET_NAME=my-project-eks-configs-staging\nSLACK_WEBHOOK_URL_SSM_KEY=\"/my-project_ops/SLACK_WEBHOOK_URL/my-project_alerts\"\n\npost_slack () {\n  local message=\"$1\"\n  SLACK_WEBHOOK_URL=$(aws ssm get-parameter --name \"$SLACK_WEBHOOK_URL_SSM_KEY\" --with-decryption | jq -r '.Parameter.Value')\n  TOKEN=$(curl -s -X PUT -H \"X-aws-ec2-metadata-token-ttl-seconds: 300\" \"http://169.254.169.254/latest/api/token\")\n  INSTANCE_ID=$(curl -s -H \"X-aws-ec2-metadata-token: $TOKEN\" \"http://169.254.169.254/latest/meta-data/instance-id\")\n  curl -X POST -H 'Content-type: application/json' -d \"{\\\"text\\\":\\\"$INSTANCE_ID: $message\\\"}\" ${SLACK_WEBHOOK_URL}\n  echo \"$message\"\n}\n\naws s3 cp s3://\"$S3_BUCKET_NAME\"/userdata/\"$NODE_TYPE\"-enc.sh /var/lib/cloud/\nif [ $? -ne 0 ]; then\n  post_slack \"Error: Failed to download user_data (s3://${S3_BUCKET_NAME}/userdata/${NODE_TYPE}-enc.sh)\"\n  exit 1\nfi\nbase64 -d /var/lib/cloud/\"$NODE_TYPE\"-enc.sh \u003e /var/lib/cloud/userdata-\"$NODE_TYPE\".sh\nchmod 755 /var/lib/cloud/userdata-\"$NODE_TYPE\".sh\n/var/lib/cloud/userdata-\"$NODE_TYPE\".sh\n"}}
    kubernetes.io/description: EC2NodeClass for worker node with custom userdata
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-05-17T04:27:52Z"
  finalizers:
  - karpenter.sh/termination
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: m6a.4xlarge
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: ap-northeast-1
    failure-domain.beta.kubernetes.io/zone: ap-northeast-1a
    k8s.io/cloud-provider-aws: 0554ca41e8c22b1fa65ef916112e8e20
    karpenter.k8s.aws/instance-category: m
    karpenter.k8s.aws/instance-cpu: "16"
    karpenter.k8s.aws/instance-cpu-manufacturer: amd
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "true"
    karpenter.k8s.aws/instance-family: m6a
    karpenter.k8s.aws/instance-generation: "6"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "65536"
    karpenter.k8s.aws/instance-network-bandwidth: "6250"
    karpenter.k8s.aws/instance-size: 4xlarge
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/initialized: "true"
    karpenter.sh/nodepool: node
    karpenter.sh/registered: "true"
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: ip-10-80-87-240.ap-northeast-1.compute.internal
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: m6a.4xlarge
    topology.ebs.csi.aws.com/zone: ap-northeast-1a
    topology.kubernetes.io/region: ap-northeast-1
    topology.kubernetes.io/zone: ap-northeast-1a
    my-project.io/node: "true"
  name: ip-10-80-87-240.ap-northeast-1.compute.internal
  ownerReferences:
  - apiVersion: karpenter.sh/v1beta1
    blockOwnerDeletion: true
  - names:
    - 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause@sha256:529cf6b1b6e5b76e901abc43aee825badbd93f9c5ee5f1e316d46a83abbce5a2
    - 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause:3.5
    sizeBytes: 298689
  nodeInfo:
    architecture: amd64
    bootID: a8d24e9e-953f-4624-aecd-158eea384646
    containerRuntimeVersion: containerd://1.7.11
    kernelVersion: 5.10.210-201.852.amzn2.x86_64
    kubeProxyVersion: v1.28.5-eks-5e0fdde
    kubeletVersion: v1.28.5-eks-5e0fdde
    machineID: ec2a843502ebf3f28b33297833ab212b
    operatingSystem: linux
    osImage: Amazon Linux 2
    systemUUID: ec2a8435-02eb-f3f2-8b33-297833ab212b

@engedaam
Copy link
Contributor

@e-koma can you share the pods that are running on nodes that you believe is should be consolidating?

@e-koma
Copy link

e-koma commented May 17, 2024

@engedaam Thank you for the reply !
Here is the pod, running as Kubernetes Jobs. ( AWS account ID is masked. )

kubectl get pod -o yaml

get pod

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    karpenter.sh/do-not-disrupt: "true"
  creationTimestamp: "2024-05-17T07:02:37Z"
  finalizers:
  - batch.kubernetes.io/job-tracking
  generateName: transfer-job-6276425-1715929357522-
  labels:
    app.kubernetes.io/component: transfer-job
    app.kubernetes.io/instance: transfer-job-6276425-1715929357522
    app.kubernetes.io/managed-by: my-project-manager
    app.kubernetes.io/name: transfer-job-6276425
    app.kubernetes.io/part-of: transfer
    batch.kubernetes.io/controller-uid: ae784a5b-dc5b-4ec0-a3b2-b4dbc370b632
    batch.kubernetes.io/job-name: transfer-job-6276425-1715929357522
    controller-uid: ae784a5b-dc5b-4ec0-a3b2-b4dbc370b632
    job-name: transfer-job-6276425-1715929357522
    my-project.io/etl_valid: "false"
  name: transfer-job-6276425-1715929357522-c5vdf
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: transfer-job-6276425-1715929357522
    uid: ae784a5b-dc5b-4ec0-a3b2-b4dbc370b632
  resourceVersion: "111027613"
  uid: 5fdd88b1-635a-4bc1-8af0-a63142a241a7
spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: job-name
              operator: Exists
          topologyKey: kubernetes.io/hostname
        weight: 10
  containers:
  - args:
    - transfer:run[6276425,false]
    envFrom:
    - configMapRef:
        name: my-project-config
    - secretRef:
        name: my-project-secret
    image:****.dkr.ecr.ap-northeast-1.amazonaws.com/worker.my-project.io:06b18dd453233106abf4e4690efda2aabc067f84
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - sh
          - /work/scripts/terminate-worker.sh
    name: worker
    resources:
      limits:
        cpu: 3500m
        memory: 15Gi
      requests:
        cpu: 3500m
        memory: 15Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp
      name: tmp-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-tfx7l
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://87f3903744c817d80103abde3e82012a7aacbc672fcbc21c40aff0af886f68ef
        exitCode: 0
        finishedAt: "2024-05-17T07:16:57Z"
        reason: Completed
        startedAt: "2024-05-17T07:16:40Z"
  hostIP: 10.80.96.71
  phase: Running
  podIP: 10.80.106.135
  podIPs:
  - ip: 10.80.106.135
  qosClass: Guaranteed
  startTime: "2024-05-17T07:16:39Z"

kubectl describe pod

describe pod

Name:             transfer-job-6276425-1715929357522-c5vdf
Namespace:        default
Priority:         0
Service Account:  my-project-transfer-job
Node:             ip-10-80-96-71.ap-northeast-1.compute.internal/10.80.96.71
Start Time:       Fri, 17 May 2024 07:16:39 +0000
Labels:           app.kubernetes.io/component=transfer-job
                  app.kubernetes.io/instance=transfer-job-6276425-1715929357522
                  app.kubernetes.io/managed-by=my-project-manager
                  app.kubernetes.io/name=transfer-job-6276425
                  app.kubernetes.io/part-of=transfer
                  batch.kubernetes.io/controller-uid=ae784a5b-dc5b-4ec0-a3b2-b4dbc370b632
                  batch.kubernetes.io/job-name=transfer-job-6276425-1715929357522
                  controller-uid=ae784a5b-dc5b-4ec0-a3b2-b4dbc370b632
                  job-name=transfer-job-6276425-1715929357522
                  my-project.io/etl_valid=false
Annotations:      cluster-autoscaler.kubernetes.io/safe-to-evict: false
                  karpenter.sh/do-not-disrupt: true
Status:           Succeeded
IP:
IPs:              <none>
Controlled By:    Job/transfer-job-6276425-1715929357522
Containers:
  worker:
    Container ID:  containerd://87f3903744c817d80103abde3e82012a7aacbc672fcbc21c40aff0af886f68ef
    Image:         ****.dkr.ecr.ap-northeast-1.amazonaws.com/worker.my-project.io:06b18dd453233106abf4e4690efda2aabc067f84
    Image ID:      sha256:de357122189ee012f710a36a4f864ad45b292dff53a6af61da63d003b86238aa
    Port:          <none>
    Host Port:     <none>
    Args:
      transfer:run[6276425,false]
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 17 May 2024 07:16:40 +0000
      Finished:     Fri, 17 May 2024 07:16:57 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     3500m
      memory:  15Gi
    Requests:
      cpu:     3500m
      memory:  15Gi
    Environment Variables from:
      my-project-config  ConfigMap  Optional: false
      my-project-secret  Secret     Optional: false
    Environment:     <none>
    Mounts:
      /tmp from tmp-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tfx7l (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tmp-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  200Gi
  kube-api-access-tfx7l:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                     From                Message
  ----     ------             ----                    ----                -------
  Warning  FailedScheduling   14m                     default-scheduler   0/3 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) had untolerated taint {ToBeDeletedByClusterAutoscaler: 1715929322}, 1 node(s) had untolerated taint {my-project.io/control-plane: true}. preemption: 0/3 nodes are available: 1 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling..
  Warning  FailedScheduling   13m (x7 over 14m)       default-scheduler   0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint {my-project.io/control-plane: true}. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling   12m                     default-scheduler   0/4 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint {my-project.io/control-plane: true}, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 1 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling..
  Warning  FailedScheduling   11m (x2 over 12m)       default-scheduler   0/5 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint {my-project.io/control-plane: true}, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/5 nodes are available: 1 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..
  Warning  FailedScheduling   11m                     default-scheduler   0/5 nodes are available: 1 node(s) had untolerated taint {my-project.io/control-plane: true}, 2 Insufficient cpu, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/5 nodes are available: 2 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling..
  Warning  FailedScheduling   10m (x3 over 11m)       default-scheduler   0/5 nodes are available: 1 Insufficient memory, 1 node(s) had untolerated taint {my-project.io/control-plane: true}, 4 Insufficient cpu. preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod..
  Warning  FailedScheduling   10m (x6 over 11m)       default-scheduler   0/5 nodes are available: 1 node(s) had untolerated taint {my-project.io/control-plane: true}, 4 Insufficient cpu. preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod..
  Warning  FailedScheduling   4m35s (x11 over 9m49s)  default-scheduler   0/5 nodes are available: 1 node(s) had untolerated taint {my-project.io/control-plane: true}, 2 Insufficient memory, 4 Insufficient cpu. preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod..
  Normal   NotTriggerScaleUp  4m9s (x55 over 14m)     cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) had untolerated taint {my-project.io/control-plane: true}, 1 max node group size reached
  Normal   Nominated          114s                    karpenter           Pod should schedule on: nodeclaim/node-95dlj
  Normal   Pulled             36s                     kubelet             Container image "****.dkr.ecr.ap-northeast-1.amazonaws.com/worker.my-project.io:06b18dd453233106abf4e4690efda2aabc067f84" already present on machine
  Normal   Created            36s                     kubelet             Created container worker
  Normal   Started            36s                     kubelet             Started container worker

@e-koma
Copy link

e-koma commented May 17, 2024

After these pods stop, some nodes disrupt as expected, while others do not.

@engedaam
Copy link
Contributor

engedaam commented May 20, 2024

@e-koma From the looks of it, the pod seemed to have a podAffinity. This most likely the reason why Karpenter decided to not consolidate a node. This is called out as part of our docs here: https://karpenter.sh/docs/concepts/disruption/#consolidation

affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: job-name
              operator: Exists
          topologyKey: kubernetes.io/hostname
        weight: 10

@e-koma
Copy link

e-koma commented May 22, 2024

@engedaam
Thank you for the confirmation.
To provide additional details, the pod has podAffinity, but since this pod is a k8s Jobs, it terminates a few seconds after its execution. (Apologies for the lack of information earlier.)

Details

  • When running many pods as k8s Jobs, nodes scale out.
  • After a while, all Jobs will have finished executing.
  • The expected behavior is that once all pods have terminated, nodes should become candidates for disruption under the WhenEmpty option, and all nodes should scale in.
  • Initially, all nodes that had scaled out were disrupted.
    However, after repeatedly starting and stopping k8s Jobs pods several times, two out of the all nodes no longer underwent disruption.
  • Subsequently, after repeatedly starting and stopping k8s Jobs in the same manner, the two nodes that did not disrupt continued to persist without disruption.

I would appreciate your help in investigating what is happening with these two nodes.
Thank you.

@engedaam
Copy link
Contributor

@e-koma Can you share the details on the nodes the were not disrupted?

@e-koma
Copy link

e-koma commented May 29, 2024

Sure, I'll try to recreate it around this Friday!

@e-koma
Copy link

e-koma commented May 31, 2024

@engedaam
Hi, this is the node information where the issue was reproduced.
There are many events related to the do-not-disrupt annotation,
but please ignore them as this pod has already been stopped.

Node Info

kubectl describe nodeclaim node-68kbk

Name:         node-68kbk
Namespace:
Labels:       karpenter.k8s.aws/instance-category=m
              karpenter.k8s.aws/instance-cpu=16
              karpenter.k8s.aws/instance-cpu-manufacturer=amd
              karpenter.k8s.aws/instance-encryption-in-transit-supported=true
              karpenter.k8s.aws/instance-family=m6a
              karpenter.k8s.aws/instance-generation=6
              karpenter.k8s.aws/instance-hypervisor=nitro
              karpenter.k8s.aws/instance-memory=65536
              karpenter.k8s.aws/instance-network-bandwidth=6250
              karpenter.k8s.aws/instance-size=4xlarge
              karpenter.sh/capacity-type=on-demand
              karpenter.sh/nodepool=node
              kubernetes.io/arch=amd64
              kubernetes.io/os=linux
              node.kubernetes.io/instance-type=m6a.4xlarge
              topology.kubernetes.io/region=ap-northeast-1
              topology.kubernetes.io/zone=ap-northeast-1c
              trocco.io/node=true
Annotations:  karpenter.k8s.aws/ec2nodeclass-hash: 4618831760887766303
              karpenter.k8s.aws/ec2nodeclass-hash-version: v2
              karpenter.k8s.aws/tagged: true
              karpenter.sh/nodepool-hash: 16961008295110681836
              karpenter.sh/nodepool-hash-version: v2
              kubernetes.io/description: EC2NodeClass for worker node with custom userdata
API Version:  karpenter.sh/v1beta1
Kind:         NodeClaim
Metadata:
  Creation Timestamp:  2024-05-31T07:58:41Z
  Finalizers:
    karpenter.sh/termination
  Generate Name:  node-
  Generation:     1
  Owner References:
    API Version:           karpenter.sh/v1beta1
    Block Owner Deletion:  true
    Kind:                  NodePool
    Name:                  node
    UID:                   40d85c98-f73b-4051-872b-f637211ce57c
  Resource Version:        122214778
  UID:                     be2649b4-a54b-4b32-a4fd-9f3a68f3cf1c
Spec:
  Node Class Ref:
    API Version:  karpenter.k8s.aws/v1beta1
    Kind:         EC2NodeClass
    Name:         node
  Requirements:
    Key:       karpenter.sh/capacity-type
    Operator:  In
    Values:
      on-demand
    Key:       trocco.io/node
    Operator:  In
    Values:
      true
    Key:       karpenter.sh/nodepool
    Operator:  In
    Values:
    Last Transition Time:  2024-05-31T08:02:24Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2024-05-31T08:01:42Z
    Status:                True
    Type:                  Registered
  Image ID:                ami-0594c768bd89780c7
  Node Name:               ip-10-80-104-183.ap-northeast-1.compute.internal
  Provider ID:             aws:///ap-northeast-1c/i-08b29a38021310a9c
Events:
  Type    Reason             Age                From       Message
  ----    ------             ----               ----       -------
  Normal  DisruptionBlocked  47m                karpenter  Cannot disrupt NodeClaim: PDB "kube-system/ebs-csi-controller" prevents pod evictions
  Normal  DisruptionBlocked  43m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-15276-rev-154430-1717143051506-5cq59" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  41m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-6332787-1717143081626-97sst" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  39m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-6332793-1717143085297-cw2fg" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  37m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-5085940-v22rz" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  35m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-5086029-x9zrg" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  31m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-6333084-1717143706935-nhz4l" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  29m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-5086279-v4nmj" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  27m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-6333112-1717143802992-4r4lv" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  25m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-5086520-7tnln" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  23m                karpenter  Cannot disrupt NodeClaim: Pod "default/****-5086624-4nxhf" has "karpenter.sh/do-not-disrupt" annotation
  Normal  DisruptionBlocked  21m (x2 over 33m)  karpenter  Cannot disrupt NodeClaim: Nominated for a pending pod
  Normal  DisruptionBlocked  13m (x4 over 19m)  karpenter  (combined from similar events): Cannot disrupt NodeClaim: Pod "default/****-5087074-ljnxj" has "karpenter.sh/do-not-disrupt" annotation

kubectl describe node

Name:               ip-10-80-104-183.ap-northeast-1.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m6a.4xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=ap-northeast-1
                    failure-domain.beta.kubernetes.io/zone=ap-northeast-1c
                    k8s.io/cloud-provider-aws=0554ca41e8c22b1fa65ef916112e8e20
                    karpenter.k8s.aws/instance-category=m
                    karpenter.k8s.aws/instance-cpu=16
                    karpenter.k8s.aws/instance-cpu-manufacturer=amd
                    karpenter.k8s.aws/instance-encryption-in-transit-supported=true
                    karpenter.k8s.aws/instance-family=m6a
                    karpenter.k8s.aws/instance-generation=6
                    karpenter.k8s.aws/instance-hypervisor=nitro
                    karpenter.k8s.aws/instance-memory=65536
                    karpenter.k8s.aws/instance-network-bandwidth=6250
                    karpenter.k8s.aws/instance-size=4xlarge
                    karpenter.sh/capacity-type=on-demand
                    karpenter.sh/initialized=true
                    karpenter.sh/nodepool=node
                    karpenter.sh/registered=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-80-104-183.ap-northeast-1.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m6a.4xlarge
                    topology.ebs.csi.aws.com/zone=ap-northeast-1c
                    topology.kubernetes.io/region=ap-northeast-1
                    topology.kubernetes.io/zone=ap-northeast-1c
                    trocco.io/node=true
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.80.104.183
                    csi.volume.kubernetes.io/nodeid:
                      {"csi.tigera.io":"ip-10-80-104-183.ap-northeast-1.compute.internal","ebs.csi.aws.com":"i-08b29a38021310a9c"}
                    karpenter.k8s.aws/ec2nodeclass-hash: 4618831760887766303
                    karpenter.k8s.aws/ec2nodeclass-hash-version: v2
                    karpenter.sh/nodepool-hash: 16961008295110681836
                    karpenter.sh/nodepool-hash-version: v2
                    kubernetes.io/description: EC2NodeClass for worker node with custom userdata
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 31 May 2024 08:01:41 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-80-104-183.ap-northeast-1.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Fri, 31 May 2024 08:55:30 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 31 May 2024 08:52:02 +0000   Fri, 31 May 2024 08:01:34 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 31 May 2024 08:52:02 +0000   Fri, 31 May 2024 08:01:34 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 31 May 2024 08:52:02 +0000   Fri, 31 May 2024 08:01:34 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 31 May 2024 08:52:02 +0000   Fri, 31 May 2024 08:02:22 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.80.104.183
  InternalDNS:  ip-10-80-104-183.ap-northeast-1.compute.internal
  Hostname:     ip-10-80-104-183.ap-northeast-1.compute.internal
Capacity:
  cpu:                16
  ephemeral-storage:  104845292Ki
  hugepages-1Gi:      0
  Type     Reason                   Age                From                   Message
  ----     ------                   ----               ----                   -------
  Normal   Starting                 53m                kube-proxy
  Normal   NodeHasSufficientPID     54m (x2 over 54m)  kubelet                Node ip-10-80-104-183.ap-northeast-1.compute.internal status is now: NodeHasSufficientPID
  Warning  InvalidDiskCapacity      54m                kubelet                invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  54m (x2 over 54m)  kubelet                Node ip-10-80-104-183.ap-northeast-1.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    54m (x2 over 54m)  kubelet                Node ip-10-80-104-183.ap-northeast-1.compute.internal status is now: NodeHasNoDiskPressure
  Normal   Starting                 54m                kubelet                Starting kubelet.
  Normal   NodeAllocatableEnforced  54m                kubelet                Updated Node Allocatable limit across pods
  Normal   Synced                   53m                cloud-node-controller  Node synced successfully
  Normal   RegisteredNode           53m                node-controller        Node ip-10-80-104-183.ap-northeast-1.compute.internal event: Registered Node ip-10-80-104-183.ap-northeast-1.compute.internal in Controller
  Normal   NodeReady                53m                kubelet                Node ip-10-80-104-183.ap-northeast-1.compute.internal status is now: NodeReady
  Normal   DisruptionBlocked        49m                karpenter              Cannot disrupt Node: PDB "kube-system/ebs-csi-controller" prevents pod evictions
  Normal   DisruptionBlocked        44m                karpenter              Cannot disrupt Node: Pod "default/****-15276-rev-154430-1717143051506-5cq59" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        42m                karpenter              Cannot disrupt Node: Pod "default/****-6332787-1717143081626-97sst" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        40m                karpenter              Cannot disrupt Node: Pod "default/****-6332793-1717143085297-cw2fg" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        38m                karpenter              Cannot disrupt Node: Pod "default/****-5085940-v22rz" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        36m                karpenter              Cannot disrupt Node: Pod "default/****-5086029-x9zrg" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        32m                karpenter              Cannot disrupt Node: Pod "default/****-6333084-1717143706935-nhz4l" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        30m                karpenter              Cannot disrupt Node: Pod "default/****-5086279-v4nmj" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        28m                karpenter              Cannot disrupt Node: Pod "default/****-6333112-1717143802992-4r4lv" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        26m                karpenter              Cannot disrupt Node: Pod "default/****-5086520-7tnln" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        24m                karpenter              Cannot disrupt Node: Pod "default/****-5086624-4nxhf" has "karpenter.sh/do-not-disrupt" annotation
  Normal   DisruptionBlocked        22m (x2 over 34m)  karpenter              Cannot disrupt Node: Nominated for a pending pod
  Normal   DisruptionBlocked        14m (x4 over 20m)  karpenter              (combined from similar events): Cannot disrupt Node: Pod "default/****-5087074-ljnxj" has "karpenter.sh/do-not-disrupt" annotation

Now, from the results of reproducing the issue, I found the problematic part.
The following logs are only output on nodes that are not disrupted.

50m         Normal    DisruptionBlocked         node/ip-10-80-104-183.ap-northeast-1.compute.internal                     Cannot disrupt Node: PDB "kube-system/ebs-csi-controller" prevents pod evictions
50m         Normal    DisruptionBlocked         nodeclaim/node-68kbk                                                      Cannot disrupt NodeClaim: PDB "kube-system/ebs-csi-controller" prevents pod evictions

It seems that the ebs-csi-controller was causing.

However, if this is the reason, then everyone using the ebs-csi-driver addon should encounter the same issue,.
Why aren't others facing this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants