Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add await logic for DaemonSets #609

Closed
Tracked by #2824
RichardWLaub opened this issue Jun 24, 2019 · 2 comments · Fixed by #2953
Closed
Tracked by #2824

Add await logic for DaemonSets #609

RichardWLaub opened this issue Jun 24, 2019 · 2 comments · Fixed by #2953
Assignees
Labels
area/await-logic customer/lighthouse Lighthouse customer bugs and enhancements kind/enhancement Improvements or new features resolution/fixed This issue was fixed

Comments

@RichardWLaub
Copy link
Contributor

We are deploying the sysdig helm chart that creates a daemonSet with a rollingUpdate updateStrategy. We immediately run tests to make sure the pods are running correctly afterward. With deployments and services our tests always pass because pulumi waits to verify that those kinds deployed successfully. Our daemonSet tests sometimes fail because there is no await logic for daemonsets.

Simple repro:

import * as k8s from '@pulumi/kubernetes';

new k8s.helm.v2.Chart('simple-nginx-local', {
  path: 'ds-chart',
  values: {
    image: {
      tag: '1.16'
    }
  }
});

With a chart with template ds-chart/templates/daemonset.yaml:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: test
spec:
  selector:
    matchLabels:
      name: nginx-ds
  template:
    metadata:
      labels:
        name: nginx-ds
      name: nginx-ds
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "nginx:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Toggle the tag above to 1.17 to see that the pulumi program finishes before the new pods are ready.

@lukehoban lukehoban added the kind/enhancement Improvements or new features label Jun 25, 2019
@hausdorff hausdorff added this to backlog in Q3 Kubernetes Jul 22, 2019
@lukehoban lukehoban added this to the 0.26 milestone Jul 25, 2019
@lblackstone lblackstone modified the milestones: 0.26, 0.27 Aug 5, 2019
@casey-robertson
Copy link

Raising awareness. We had a sysdig outage that could have been mitigated sooner if this was in place.

@lblackstone lblackstone modified the milestones: 0.27, 0.28 Sep 18, 2019
@lblackstone lblackstone modified the milestones: 0.28, 0.29 Oct 7, 2019
@lblackstone lblackstone removed this from the 0.29 milestone Nov 22, 2019
@lblackstone lblackstone removed their assignment Jul 14, 2023
@rshade rshade linked a pull request Jan 25, 2024 that will close this issue
@rshade rshade added the customer/lighthouse Lighthouse customer bugs and enhancements label Jan 25, 2024
@blampe
Copy link
Contributor

blampe commented Apr 11, 2024

@RichardWLaub @casey-robertson @rquitales I wanted to share the current approach I'm taking with this in case you have any concerns or suggestions.

For DaemonSets with a RollingUpdate strategy, we will essentially follow the same behavior as kubectl rollout status. Pulumi will wait until the rollout status is ready unless the DS has a skipAwait annotation.

It's less clear how to handle the OnDelete update strategy. For background, we currently handle StatefulSets with OnDelete update strategies by waiting for all pods to be manually removed (#2473); if the user doesn't want this behavior they must annotate the OnDelete StatefulSet with skipAwait.

I'm preserving this behavior for OnDelete DaemonSets for consistently, although I see an argument that Pulumi should not wait for this update strategy at all.

blampe added a commit that referenced this issue May 17, 2024
This adds await logic for DaemonSets with RollingUpdate or OnDelete
update strategies.

The implementation is largely based on the existing StatefulSet logic
with two high-level simplifications:

1. We use
[kstatus](https://pkg.go.dev/sigs.k8s.io/cli-utils/pkg/kstatus/status)
to decide when a DaemonSet is ready.
2. We use a `PodAggregator` to handle reporting pod statuses.

Importantly, unlike StatefulSet this means we do not currently inspect
pods to decide readiness -- we only use them for informational purposes.
I _think_ this is sufficient but I could easily be missing something. I
haven't been able to simulate situations where this logic doesn't fully
capture readiness and we would need to inspect pod statuses.

A failing e2e test was added in YAML under the awkwardly name 
`tests/sdk/java` path.

Unit tests were added around the public `Creation`, `Update`, etc.
methods in order to more fully exercise timeouts and retries. To that
end I introduced a mock clock package which might be controversial. IMO
Go doesn't have a great OSS mock clock but something like this can be
very helpful for testing.

I'm still somewhat confused by the role of `await.Read` since it doesn't
actually await anything, but it's implemented similar to StatefulSet as
a one-shot read + readiness check.

Fixes #609
Refs #2800
Refs #2799
Refs #2798
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/await-logic customer/lighthouse Lighthouse customer bugs and enhancements kind/enhancement Improvements or new features resolution/fixed This issue was fixed
Projects
Status: 🚀 Shipped
8 participants