eksctl anywhere upgrade cluster doesn't use the extra hardware from hardware.csv #7818

ygao-armada · 2024-03-10T22:36:40Z

What happened:
I created an EKS anywhere cluster with 1 CP node, kubernetes 1.26

Then I try to upgrade it to 1.27, with 2 new CP nodes plus the existing CP node in the new hardware file.

cluster config file change is:

diff eksa-mgmt02-md-cluster.yaml eksa-mgmt02-md-cluster-sc.yaml
25c25
<   kubernetesVersion: "1.26"
---
>   kubernetesVersion: "1.27"
36c36
<   osImageURL: "https://<image store>.blob.core.windows.net/ubuntu-2004-efi/ubuntu-2004-efi-eksa-kube-v1.26.7.gz"
---
>   osImageURL: "https://<image store>.blob.core.windows.net/ubuntu-2004-efi/ubuntu-2004-efi-eksa-kube-v1.27.11.gz"
70c70
<           IMG_URL: https://<image store>.blob.core.windows.net/ubuntu-2004-efi/ubuntu-2004-efi-eksa-kube-v1.26.7.gz
---
>           IMG_URL: https://<image store>.blob.core.windows.net/ubuntu-2004-efi/ubuntu-2004-efi-eksa-kube-v1.27.11.gz

hardware change is:

diff hardware-mgmt02.csv hardware-mgmt02-cp3.csv 
2a3,4
> eksa-control-02,...,type=cp,/dev/sda
> eksa-control-03,...,type=cp,/dev/sda

The command I use:
eksctl anywhere upgrade cluster -f eksa-mgmt02-md-cluster-sc.yaml --hardware-csv hardware-mgmt02-cp3.csv --no-timeouts -v 9

I see upgrade stuck with following errors:

...
2024-04-15T07:16:38.920Z	V6	Executing command	{"cmd": "/usr/bin/docker exec -i eksa_1713165386699246825 kubectl get --ignore-not-found -o json --kubeconfig /home/armada/eksa/mgmt02/mgmt02/mgmt02-eks-a-cluster.kubeconfig Cluster.v1alpha1.anywhere.eks.amazonaws.com --namespace default mgmt02"}
2024-04-15T07:16:39.030Z	V9	Cluster generation and observedGeneration	{"Generation": 2, "ObservedGeneration": 2}
2024-04-15T07:16:39.031Z	V5	Error happened during retry	{"error": "cluster has an error: hardware validation failure: for rolling upgrade, minimum hardware count not met for selector '{\"type\":\"cp\"}': have 0, require 1", "retries": 1}
2024-04-15T07:16:39.031Z	V5	Sleeping before next retry	{"time": "1s"}
...
2024-04-15T07:32:06.179Z	V6	Executing command	{"cmd": "/usr/bin/docker exec -i eksa_1713165386699246825 kubectl get --ignore-not-found -o json --kubeconfig /home/armada/eksa/mgmt02/mgmt02/mgmt02-eks-a-cluster.kubeconfig Cluster.v1alpha1.anywhere.eks.amazonaws.com --namespace default mgmt02"}
2024-04-15T07:32:06.285Z	V9	Cluster generation and observedGeneration	{"Generation": 2, "ObservedGeneration": 2}
2024-04-15T07:32:06.285Z	V5	Error happened during retry	{"error": "cluster has an error: hardware validation failure: for rolling upgrade, minimum hardware count not met for selector '{\"type\":\"cp\"}': have 0, require 1", "retries": 832}
2024-04-15T07:32:06.286Z	V5	Sleeping before next retry	{"time": "1s"}
...

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

EKS Anywhere Release: v0.18.7
EKS Distro Release: 1.26/1.27

The text was updated successfully, but these errors were encountered:

mitalipaygude · 2024-04-11T23:00:47Z

From the initial look what I understand is this was a rolling and scaling upgrade request. This means that 2 extra hardwares were expected for the upgrade - 1 for the scale up, 1 spare for the rolling upgrade of 1.26 -> 1.27. My guess is that the extra hardware you added to the csv was used for the new CP node and eks anywhere didn't have enough for the rolling upgrade.

ygao-armada · 2024-04-14T09:09:27Z

@mitalipaygude I just try to use 1 more machine (1 existing host + 2 idle hosts), same issue shows:

2024-04-14T08:59:52.814Z	V6	Executing command	{"cmd": "/usr/bin/docker exec -i eksa_1713084293614893292 kubectl get --ignore-not-found -o json --kubeconfig /home/armada/eksa/mgmt02/mgmt02/mgmt02-eks-a-cluster.kubeconfig Cluster.v1alpha1.anywhere.eks.amazonaws.com --namespace default mgmt02"}
2024-04-14T08:59:52.934Z	V9	Cluster generation and observedGeneration	{"Generation": 3, "ObservedGeneration": 3}
2024-04-14T08:59:52.934Z	V5	Error happened during retry	{"error": "cluster has an error: hardware validation failure: for rolling upgrade, minimum hardware count not met for selector '{\"type\":\"cp\"}': have 0, require 1", "retries": 794}
2024-04-14T08:59:52.934Z	V5	Sleeping before next retry	{"time": "1s"}

At the same time, according to https://anywhere.eks.amazonaws.com/docs/clustermgmt/cluster-upgrades/baremetal-upgrades/:

EKS Anywhere upgrades on Bare Metal require at least one spare hardware server for control plane upgrade and one for each worker node group upgrade.

So 1 spare machine is good enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eksctl anywhere upgrade cluster doesn't use the extra hardware from hardware.csv #7818

eksctl anywhere upgrade cluster doesn't use the extra hardware from hardware.csv #7818

ygao-armada commented Mar 10, 2024 •

edited

mitalipaygude commented Apr 11, 2024

ygao-armada commented Apr 14, 2024

eksctl anywhere upgrade cluster doesn't use the extra hardware from hardware.csv #7818

eksctl anywhere upgrade cluster doesn't use the extra hardware from hardware.csv #7818

Comments

ygao-armada commented Mar 10, 2024 • edited

mitalipaygude commented Apr 11, 2024

ygao-armada commented Apr 14, 2024

ygao-armada commented Mar 10, 2024 •

edited