Add await logic for DaemonSets #609

RichardWLaub · 2019-06-24T23:40:13Z

We are deploying the sysdig helm chart that creates a daemonSet with a rollingUpdate updateStrategy. We immediately run tests to make sure the pods are running correctly afterward. With deployments and services our tests always pass because pulumi waits to verify that those kinds deployed successfully. Our daemonSet tests sometimes fail because there is no await logic for daemonsets.

Simple repro:

import * as k8s from '@pulumi/kubernetes';

new k8s.helm.v2.Chart('simple-nginx-local', {
  path: 'ds-chart',
  values: {
    image: {
      tag: '1.16'
    }
  }
});

With a chart with template ds-chart/templates/daemonset.yaml:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: test
spec:
  selector:
    matchLabels:
      name: nginx-ds
  template:
    metadata:
      labels:
        name: nginx-ds
      name: nginx-ds
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "nginx:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Toggle the tag above to 1.17 to see that the pulumi program finishes before the new pods are ready.

The text was updated successfully, but these errors were encountered:

casey-robertson · 2019-09-06T20:29:34Z

Raising awareness. We had a sysdig outage that could have been mitigated sooner if this was in place.

blampe · 2024-04-11T19:08:05Z

@RichardWLaub @casey-robertson @rquitales I wanted to share the current approach I'm taking with this in case you have any concerns or suggestions.

For DaemonSets with a RollingUpdate strategy, we will essentially follow the same behavior as kubectl rollout status. Pulumi will wait until the rollout status is ready unless the DS has a skipAwait annotation.

It's less clear how to handle the OnDelete update strategy. For background, we currently handle StatefulSets with OnDelete update strategies by waiting for all pods to be manually removed (#2473); if the user doesn't want this behavior they must annotate the OnDelete StatefulSet with skipAwait.

I'm preserving this behavior for OnDelete DaemonSets for consistently, although I see an argument that Pulumi should not wait for this update strategy at all.

This adds await logic for DaemonSets with RollingUpdate or OnDelete update strategies. The implementation is largely based on the existing StatefulSet logic with two high-level simplifications: 1. We use [kstatus](https://pkg.go.dev/sigs.k8s.io/cli-utils/pkg/kstatus/status) to decide when a DaemonSet is ready. 2. We use a `PodAggregator` to handle reporting pod statuses. Importantly, unlike StatefulSet this means we do not currently inspect pods to decide readiness -- we only use them for informational purposes. I _think_ this is sufficient but I could easily be missing something. I haven't been able to simulate situations where this logic doesn't fully capture readiness and we would need to inspect pod statuses. A failing e2e test was added in YAML under the awkwardly name `tests/sdk/java` path. Unit tests were added around the public `Creation`, `Update`, etc. methods in order to more fully exercise timeouts and retries. To that end I introduced a mock clock package which might be controversial. IMO Go doesn't have a great OSS mock clock but something like this can be very helpful for testing. I'm still somewhat confused by the role of `await.Read` since it doesn't actually await anything, but it's implemented similar to StatefulSet as a one-shot read + readiness check. Fixes #609 Refs #2800 Refs #2799 Refs #2798

lukehoban assigned lblackstone Jun 25, 2019

lukehoban added the kind/enhancement Improvements or new features label Jun 25, 2019

pgavlin added the feature/q3 label Jul 12, 2019

hausdorff added this to backlog in Q3 Kubernetes Jul 22, 2019

lukehoban added this to the 0.26 milestone Jul 25, 2019

lukehoban removed the feature/q3 label Aug 3, 2019

lblackstone modified the milestones: 0.26, 0.27 Aug 5, 2019

lblackstone modified the milestones: 0.27, 0.28 Sep 18, 2019

lblackstone modified the milestones: 0.28, 0.29 Oct 7, 2019

lblackstone removed this from the 0.29 milestone Nov 22, 2019

lblackstone removed their assignment Jul 14, 2023

lblackstone added the area/await-logic label Jul 14, 2023

rshade linked a pull request Jan 25, 2024 that will close this issue

DaemonSet Ready Checks #2769

Closed

rshade added the customer/lighthouse Lighthouse customer bugs and enhancements label Jan 25, 2024

mjeffryes mentioned this issue Feb 15, 2024

Improve Await logic #2824

Open

mjeffryes assigned blampe Apr 2, 2024

blampe mentioned this issue Apr 15, 2024

Add await logic for DaemonSets #2953

Merged

blampe closed this as completed in #2953 May 17, 2024

pulumi-bot added the resolution/fixed This issue was fixed label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add await logic for DaemonSets #609

Add await logic for DaemonSets #609

RichardWLaub commented Jun 24, 2019

casey-robertson commented Sep 6, 2019

blampe commented Apr 11, 2024

Add await logic for DaemonSets #609

Add await logic for DaemonSets #609

Comments

RichardWLaub commented Jun 24, 2019

casey-robertson commented Sep 6, 2019

blampe commented Apr 11, 2024