Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm upgrade for charts that contain statefulsets with .spec.updateStrategy.rollingUpdate.partition is incorrect. #11773

Closed
amannijhawan opened this issue Jan 29, 2023 · 8 comments · Fixed by #11774
Labels
bug Categorizes issue or PR as related to a bug. in progress

Comments

@amannijhawan
Copy link
Contributor

amannijhawan commented Jan 29, 2023

Output of helm version:

[centos@dev-server-centos helm]$ helm version
version.BuildInfo{Version:"v3.10.3", GitCommit:"835b7334cfe2e5e27870ab3ed4135f136eecc704", GitTreeState:"clean", GoVersion:"go1.18.9"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.14-dispatcher-dirty", GitCommit:"009f3ac5f1d09b1ec5d4dc811b6032271fe61f75", GitTreeState:"dirty", BuildDate:"2022-09-19T01:31:36Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.15-gke.2500", GitCommit:"fbf2f97a417bed01195ce67ab3c222d68f06f8b7", GitTreeState:"clean", BuildDate:"2022-10-26T09:24:45Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): GKE

Hi Helm Maintainers.

In our testing we found that helm upgrade for charts that contain statefulsets with .spec.updateStrategy.rollingUpdate.partition does not function as expected. When using with a --wait and --timeout parameter the following command always hangs
helm upgrade helm-test ./ --debug --timeout=900s --wait -n test-helm

Here is a simple chart that recreates this issue.
https://github.com/amannijhawan/helm-test-case-anijhawan/blob/gh-pages/example-0.1.0.tgz

Steps to reproduce:

  1. Create a local scratch dir
    mkdir repro
  2. Download the helm package
    cd repro && wget https://github.com/amannijhawan/helm-test-case-anijhawan/raw/gh-pages/example-0.1.0.tgz
  3. Unarchive Package
    tar -xvzf example-0.1.0.tgz
  4. Create an isolated namesapce
    kubectl create ns repro
  5. Helm install the chart.
    helm install helm-test . -n repro
  6. Attempt Helm Upgrade
    helm upgrade helm-test ./ --debug --timeout=100s --wait --values=upgrade.yaml -n repro
    We see that the helm upgrade fails with the following logs.

Edited to included the full output that shows the correct logs

[centos@dev-server-centos helm-test-case-anijhawan]$ helm upgrade helm-test  ./ --debug --timeout=100s --wait   -n repro
upgrade.go:142: [debug] preparing upgrade for helm-test
upgrade.go:150: [debug] performing update for helm-test
upgrade.go:322: [debug] creating upgraded release for helm-test
client.go:229: [debug] checking 2 resources for changes
client.go:512: [debug] Looks like there are no changes for Service "nginx"
client.go:521: [debug] Patch StatefulSet "web" in namespace repro
upgrade.go:394: [debug] waiting for release helm-test resources (created: 0 updated: 2  deleted: 0)
wait.go:66: [debug] beginning wait for 2 resources with timeout of 1m40s
ready.go:362: [debug] StatefulSet is not ready: repro/web. update has not yet been observed
ready.go:393: [debug] StatefulSet is not ready: repro/web. 2 out of 3 expected pods are ready
ready.go:393: [debug] StatefulSet is not ready: repro/web. 2 out of 3 expected pods are ready
ready.go:393: [debug] StatefulSet is not ready: repro/web. 2 out of 3 expected pods are ready
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
upgrade.go:434: [debug] warning: Upgrade "helm-test" failed: timed out waiting for the condition
Error: UPGRADE FAILED: timed out waiting for the condition
helm.go:84: [debug] timed out waiting for the condition
UPGRADE FAILED
main.newUpgradeCmd.func2
	helm.sh/helm/v3/cmd/helm/upgrade.go:201
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.5.0/command.go:872
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.5.0/command.go:990
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.5.0/command.go:918
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:250
runtime.goexit
	runtime/asm_amd64.s:1571
[centos@dev-server-centos helm-test-case-anijhawan]$ ^C

However at this point we can check that the upgrade has been successfully applied to one pod matching the current value of partition.

[centos@dev-server-centos helm-test-case-anijhawan]$ kubectl get pod web-2 -o yaml   -nrepro | grep -a5 resource
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: web
    uid: 416718f4-f8bd-4f3e-b193-439571269248
  resourceVersion: "816068910"
  uid: dcd81ea4-376e-419f-b437-087a58f34819
spec:
  containers:
  - image: k8s.gcr.io/nginx-slim:0.8
    imagePullPolicy: IfNotPresent
    name: nginx
    ports:
    - containerPort: 80
      name: web
      protocol: TCP
    resources:
      requests:
        memory: 256Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
[centos@dev-server-centos helm-test-case-anijhawan]$ kubectl get pod web-1 -o yaml   -nrepro | grep -a5 resource
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: web
    uid: 416718f4-f8bd-4f3e-b193-439571269248
  resourceVersion: "816067417"
  uid: 7522e2ae-1578-40b7-8bcc-aba3a56b685c
spec:
  containers:
  - image: k8s.gcr.io/nginx-slim:0.8
    imagePullPolicy: IfNotPresent
    name: nginx
    ports:
    - containerPort: 80
      name: web
      protocol: TCP
    resources:
      requests:
        memory: 500Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:

The issue seems to be this check in ready.go thats assuming the upgrade hasn't been applied yet.

if sts.Status.CurrentRevision != sts.Status.UpdateRevision {

amannijhawan added a commit to amannijhawan/helm that referenced this issue Jan 30, 2023
Fixes helm#11773

This change updates readiness check in ready.go to correctly
account for statefulsets that are utilizing a partitioned upgrade.
These statefulsets only upgrade a subset of the managed pods with each call
to helm upgrade. This causes the upgrade to legitimately hit the condition where
sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark
the upgrade has failed when in fact it is successful.

This change fixes that behavior to only check when partition is unspecified or 0.
@amannijhawan
Copy link
Contributor Author

Proposed fix:
#11774

@gjenkins8
Copy link
Contributor

gjenkins8 commented Jan 30, 2023

Kubernetes docs on partitions: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#partitions

@gjenkins8
Copy link
Contributor

From a quick look, the error you report:

ready.go:393: [debug] StatefulSet is not ready: repro/web. 2 out of 3 expected pods are ready

is not the one your PR proposes to fix (which come after, on line 797)

The logic #L384 purports to account for the update partition. Perhaps this is broken somehow? Or, there was an actual error during the deployment? (your output doesn't show the web-1/web-2 pods are actually ready)

@amannijhawan
Copy link
Contributor Author

amannijhawan commented Jan 30, 2023

Sorry I believe the full output wasn't captured correctly in the previous iteration.
Here is the full output.

[centos@dev-server-centos helm-test-case-anijhawan]$ helm upgrade helm-test  ./ --debug --timeout=100s --wait   -n repro
upgrade.go:142: [debug] preparing upgrade for helm-test
upgrade.go:150: [debug] performing update for helm-test
upgrade.go:322: [debug] creating upgraded release for helm-test
client.go:229: [debug] checking 2 resources for changes
client.go:512: [debug] Looks like there are no changes for Service "nginx"
client.go:521: [debug] Patch StatefulSet "web" in namespace repro
upgrade.go:394: [debug] waiting for release helm-test resources (created: 0 updated: 2  deleted: 0)
wait.go:66: [debug] beginning wait for 2 resources with timeout of 1m40s
ready.go:362: [debug] StatefulSet is not ready: repro/web. update has not yet been observed
ready.go:393: [debug] StatefulSet is not ready: repro/web. 2 out of 3 expected pods are ready
ready.go:393: [debug] StatefulSet is not ready: repro/web. 2 out of 3 expected pods are ready
ready.go:393: [debug] StatefulSet is not ready: repro/web. 2 out of 3 expected pods are ready
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
ready.go:398: [debug] StatefulSet is not ready: repro/web. currentRevision web-6b4fd5c9f6 does not yet match updateRevision web-545c9585b7
upgrade.go:434: [debug] warning: Upgrade "helm-test" failed: timed out waiting for the condition
Error: UPGRADE FAILED: timed out waiting for the condition
helm.go:84: [debug] timed out waiting for the condition
UPGRADE FAILED
main.newUpgradeCmd.func2
	helm.sh/helm/v3/cmd/helm/upgrade.go:201
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.5.0/command.go:872
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.5.0/command.go:990
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.5.0/command.go:918
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:250
runtime.goexit
	runtime/asm_amd64.s:1571

Also here is the status of the pods that shows that web-2 is ready.

[centos@dev-server-centos helm-test-case-anijhawan]$ kubectl get pods -n repro
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          5m10s
web-1   1/1     Running   0          5m23s
web-2   1/1     Running   0          3m30s

@bhavin192
Copy link
Contributor

bhavin192 commented Jan 30, 2023

I was able to reproduce this. When we do a partitioned upgrade of a StatefulSet, the status has following content after StatefulSet is successfully upgraded. (partition was set to 2, replicas is 3).

{
  "availableReplicas": 3,
  "collisionCount": 0,
  "observedGeneration": 2,
  "readyReplicas": 3,
  "replicas": 3,
  "currentReplicas": 2,
  "currentRevision": "web-6b4fd5c9f6",
  "updatedReplicas": 1,
  "updateRevision": "web-5fbd8d85bc"
}

This basically says there are 2 replicas with currentRevision (old), and there is 1 replica with updateRevision (new). When the partition number changes this will move and eventually we will have same value of currentReplicas and updatedReplicas i.e. 3, and same value of currentRevision and updateRevision i.e. web-5fbd8d85bc

This is with partition 0, and replicas 3.

{
  "availableReplicas": 3,
  "collisionCount": 0,
  "observedGeneration": 3,
  "readyReplicas": 3,
  "replicas": 3,
  "currentReplicas": 3,
  "currentRevision": "web-5fbd8d85bc",
  "updatedReplicas": 3
  "updateRevision": "web-5fbd8d85bc",
}

If we look at kubectl rollout code (which is similar to what we are trying to achieve), they don't check the revision for partitioned rollouts, they just return with success or failure in case it is a partitioned rollout.
https://github.com/kubernetes/kubectl/blob/3ec401449e5821ad954942c7ecec9d2c90ecaaa1/pkg/polymorphichelpers/rollout_status.go#L136-L145
And after this they have check for revision which will get executed only if the partition value is 0.
I think the proposed fix is doing something similar for Helm?

PS: the original change which is causing this was added as part of #10622

@amannijhawan
Copy link
Contributor Author

And after this they have check for revision which will get executed only if the partition value is 0.
I think the proposed fix is doing something similar for Helm?

Yes thats what the the proposed fix is doing, only execute the check if partition value is unspecified or 0.
(when we are upgrading all partitions)

@joejulian joejulian added bug Categorizes issue or PR as related to a bug. in progress labels Jan 31, 2023
@gjenkins8
Copy link
Contributor

Sorry I believe the full output wasn't captured correctly in the previous iteration.
Here is the full output.

Thanks, all makes more sense now. And thanks @bhavin192 for the additional details.

I'll put a comment on the PR too; I think it is worthwhile having a test for this functionality (successful waiting for a partitioned rolling update of a statefulset). If there was test coverage, I suspect we wouldn't be here now :)

amannijhawan added a commit to amannijhawan/helm that referenced this issue Feb 2, 2023
Fixes helm#11773

This change updates readiness check in ready.go to correctly
account for statefulsets that are utilizing a partitioned upgrade.
These statefulsets only upgrade a subset of the managed pods with each call
to helm upgrade. This causes the upgrade to legitimately hit the condition where
sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark
the upgrade has failed when in fact it is successful.

This change fixes that behavior to only check when partition is unspecified or 0.

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>
@amannijhawan
Copy link
Contributor Author

example-0.1.0.tgz

Updating the archive for the helm chart as the earlier archive link is unusable now.

technosophos pushed a commit that referenced this issue Mar 30, 2023
…te. (#11774)

* Fixes Readiness Check for statefulsets using partitioned rolling update.
Fixes #11773

This change updates readiness check in ready.go to correctly
account for statefulsets that are utilizing a partitioned upgrade.
These statefulsets only upgrade a subset of the managed pods with each call
to helm upgrade. This causes the upgrade to legitimately hit the condition where
sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark
the upgrade has failed when in fact it is successful.

This change fixes that behavior to only check when partition is unspecified or 0.

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>

* Adding a unit test to verify that partitioned rolling upgrade for a statefulset works.

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>

---------

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>
Co-authored-by: Aman Nijhawan <anijhawan@yugabyte.com>
drewgonzales360 pushed a commit to drewgonzales360/helm that referenced this issue Apr 6, 2023
…te. (helm#11774)

* Fixes Readiness Check for statefulsets using partitioned rolling update.
Fixes helm#11773

This change updates readiness check in ready.go to correctly
account for statefulsets that are utilizing a partitioned upgrade.
These statefulsets only upgrade a subset of the managed pods with each call
to helm upgrade. This causes the upgrade to legitimately hit the condition where
sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark
the upgrade has failed when in fact it is successful.

This change fixes that behavior to only check when partition is unspecified or 0.

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>

* Adding a unit test to verify that partitioned rolling upgrade for a statefulset works.

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>

---------

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>
Co-authored-by: Aman Nijhawan <anijhawan@yugabyte.com>
Signed-off-by: Drew Gonzales <drew.gonzales@datadoghq.com>
mattfarina pushed a commit that referenced this issue Apr 6, 2023
…te. (#11774)

* Fixes Readiness Check for statefulsets using partitioned rolling update.
Fixes #11773

This change updates readiness check in ready.go to correctly
account for statefulsets that are utilizing a partitioned upgrade.
These statefulsets only upgrade a subset of the managed pods with each call
to helm upgrade. This causes the upgrade to legitimately hit the condition where
sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark
the upgrade has failed when in fact it is successful.

This change fixes that behavior to only check when partition is unspecified or 0.

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>

* Adding a unit test to verify that partitioned rolling upgrade for a statefulset works.

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>

---------

Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>
Co-authored-by: Aman Nijhawan <anijhawan@yugabyte.com>
(cherry picked from commit eea2f27)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Categorizes issue or PR as related to a bug. in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants