Add await logic for DaemonSets #2953

blampe · 2024-04-15T19:42:23Z

Proposed changes

This adds await logic for DaemonSets with RollingUpdate or OnDelete update strategies.

The implementation is largely based on the existing StatefulSet logic with two high-level simplifications:

We use kstatus to decide when a DaemonSet is ready.
We use a PodAggregator to handle reporting pod statuses.

Importantly, unlike StatefulSet this means we do not currently inspect pods to decide readiness -- we only use them for informational purposes. I think this is sufficient but I could easily be missing something. I haven't been able to simulate situations where this logic doesn't fully capture readiness and we would need to inspect pod statuses.

A failing e2e test was added in YAML. I noticed we didn't have any YAML tests yet, and
I've found YAML to be quick and easy to crank out for very simple test cases like this.

Unit tests were added around the public Creation, Update, etc. methods in order to more fully exercise timeouts and retries. To that end I introduced a mock clock package which might be controversial. IMO Go doesn't have a great OSS mock clock but something like this can be very helpful for testing.

I'm still somewhat confused by the role of await.Read since it doesn't actually await anything, but it's implemented similar to StatefulSet as a one-shot read + readiness check.

Related issues

Fixes #609
Refs #2800
Refs #2799
Refs #2798

github-actions · 2024-04-15T19:47:37Z

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

codecov · 2024-04-15T19:50:41Z

Codecov Report

Attention: Patch coverage is 75.00000% with 52 lines in your changes are missing coverage. Please review.

Project coverage is 35.97%. Comparing base (9d759ca) to head (d545af1).

Files	Patch %	Lines
provider/pkg/await/daemonset.go	70.58%	31 Missing and 14 partials ⚠️
provider/pkg/await/watchers.go	66.66%	1 Missing and 1 partial ⚠️
provider/pkg/clients/fake/discovery.go	66.66%	1 Missing and 1 partial ⚠️
provider/pkg/retry/retry.go	60.00%	1 Missing and 1 partial ⚠️
provider/pkg/await/awaiters.go	93.75%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2953      +/-   ##
==========================================
+ Coverage   32.34%   35.97%   +3.63%     
==========================================
  Files          69       70       +1     
  Lines        8954     9156     +202     
==========================================
+ Hits         2896     3294     +398     
+ Misses       5791     5530     -261     
- Partials      267      332      +65

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rshade · 2024-04-16T15:51:44Z

@blampe thanks for the await logic, this will be a big help. Did you happen to test the two following scenarios:

Updating to a new image(Working)
Updating to a new image(Broken).
I ask because 2 specifically has caused us problems before the waiter.

blampe · 2024-04-18T22:17:47Z

@blampe thanks for the await logic, this will be a big help. Did you happen to test the two following scenarios:

Updating to a new image(Working)

Updating to a new image(Broken).
I ask because 2 specifically has caused us problems before the waiter.

@rshade I added a failing e2e test here which should cover both of these scenarios.

This step in particular captures scenario (2) by updating to an invalid image. Previously the operation would succeed, but after this PR it will fail after it times out waiting for the broken pods to come up.

Let me know if there are any other test cases you have in mind!

…mpe/ds-await

EronWright

I don't know the awaiters too well but everything seemed good to me.

Makefile

EronWright · 2024-04-19T21:03:04Z

provider/pkg/await/daemonset.go

+		// If done=true and err is non-nil then this is an OnDelete rollout.
+		msg, done, err = dsa.onDeleteRolloutStatus()


This is a pretty awkward handoff, and given how little code is in DaemonSetStatusViewer, maybe we shouldn't use it.

I switched this over to use kstatus since it's essentially the same logic and doesn't distinguish between Rolling or OnDelete rollouts.

tests/sdk/yaml/yaml_test.go

EronWright · 2024-04-19T21:22:25Z

tests/sdk/yaml/await-daemonset/Pulumi.yaml

+plugins:
+  providers:
+    - name: kubernetes
+      path: ../../../../bin


I thought the provider linking happened automatically but I frankly don't know how. And why don't we see it in the other steps?

Yeah it happens automatically in tests, although sometimes I will manually run examples in which case this helps. I took it out since I don't need to test it manually anymore.

tests/sdk/yaml/await-daemonset/step2/Pulumi.yaml

…mpe/ds-await

I noticed as part of #2953 that the `tests/convert` and `tests/provider` directories aren't currently getting exercised in CI, and as a result one of them is failing on master. This PR moves `tests/provider` and `tests/convert` under `sdk/java`, which is currently executed in CI but doesn't contain any tests. We also fix the convert test along the way. "sdk/java" is admittedly an odd namespace to put these under, but our CI targets don't give us much flexibility at the moment, and at the end of the day it's just an arbitrary way of sharding things.

…mpe/ds-await

provider/pkg/retry/retry.go

tests/sdk/java/testdata/await/daemonset/step3/Pulumi.yaml

…mpe/ds-await

blampe · 2024-05-17T18:44:11Z

Tests flaked due to pulumi/ci-mgmt#933 / pulumi/schema-tools#64.

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [@pulumi/kubernetes](https://pulumi.com) ([source](https://togithub.com/pulumi/pulumi-kubernetes)) | dependencies | minor | [`4.11.0` -> `4.12.0`](https://renovatebot.com/diffs/npm/@pulumi%2fkubernetes/4.11.0/4.12.0) | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>pulumi/pulumi-kubernetes (@pulumi/kubernetes)</summary> ### [`v4.12.0`](https://togithub.com/pulumi/pulumi-kubernetes/blob/HEAD/CHANGELOG.md#4120-May-21-2024) [Compare Source](https://togithub.com/pulumi/pulumi-kubernetes/compare/v4.11.0...v4.12.0) ##### Added - Added a new Helm Chart v4 resource. ([pulumi/pulumi-kubernetes#2947) - Added support for deletion propagation policies (e.g. Orphan). ([pulumi/pulumi-kubernetes#3011) - Server-side apply conflict errors now include the original field manager's name. ([pulumi/pulumi-kubernetes#2983) ##### Changed - Pulumi will now wait for DaemonSets to become ready. ([pulumi/pulumi-kubernetes#2953) - The Release resource's merge behavior for `valueYamlFiles` now more closely matches Helm's behavior. ([pulumi/pulumi-kubernetes#2963) ##### Fixed - Helm Chart V3 previews no longer fail when the cluster is unreachable. ([pulumi/pulumi-kubernetes#2992) - Fixed a panic that could occur when a missing field became `null`. ([pulumi/pulumi-kubernetes#1970) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).  Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>

blampe added 3 commits April 15, 2024 11:12

Add a failing e2e test

35442c2

Add await logic for DaemonSets

07dd1ba

Changelog

ccd6a11

blampe requested review from EronWright and rquitales April 15, 2024 20:24

Merge branch 'master' of github.com:pulumi/pulumi-kubernetes into bla…

556c06c

…mpe/ds-await

mjeffryes assigned blampe Apr 19, 2024

EronWright reviewed Apr 19, 2024

View reviewed changes

blampe added 5 commits April 24, 2024 15:03

Merge branch 'master' of github.com:pulumi/pulumi-kubernetes into bla…

7d8c25b

…mpe/ds-await

Use pulumitest

280dd02

Use kstatus instead of rollout logic

68811af

Merge branch 'master' of github.com:pulumi/pulumi-kubernetes into bla…

088c8ac

…mpe/ds-await

Update doc

c2acfb7

blampe mentioned this pull request Apr 29, 2024

Fix a couple tests not running on CI #2982

Merged

blampe requested a review from EronWright April 29, 2024 23:48

Merge branch 'master' of github.com:pulumi/pulumi-kubernetes into bla…

b8bca2b

…mpe/ds-await

rquitales reviewed May 16, 2024

View reviewed changes

provider/pkg/retry/retry.go Outdated Show resolved Hide resolved

tests/sdk/java/testdata/await/daemonset/step3/Pulumi.yaml Outdated Show resolved Hide resolved

blampe added 3 commits May 17, 2024 10:34

Merge branch 'master' of github.com:pulumi/pulumi-kubernetes into bla…

2e462e7

…mpe/ds-await

feedback

56bfa47

tweak changelog

d545af1

blampe requested a review from rquitales May 17, 2024 17:40

rquitales approved these changes May 17, 2024

View reviewed changes

blampe merged commit 04fb15c into master May 17, 2024
20 checks passed

blampe deleted the blampe/ds-await branch May 17, 2024 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add await logic for DaemonSets #2953

Add await logic for DaemonSets #2953

blampe commented Apr 15, 2024 •

edited

github-actions bot commented Apr 15, 2024

codecov bot commented Apr 15, 2024 •

edited

rshade commented Apr 16, 2024 •

edited

blampe commented Apr 18, 2024

EronWright left a comment

EronWright Apr 19, 2024

blampe Apr 29, 2024

EronWright Apr 19, 2024

blampe Apr 29, 2024

blampe commented May 17, 2024 •

edited

		// If done=true and err is non-nil then this is an OnDelete rollout.
		msg, done, err = dsa.onDeleteRolloutStatus()

Add await logic for DaemonSets #2953

Add await logic for DaemonSets #2953

Conversation

blampe commented Apr 15, 2024 • edited

Proposed changes

Related issues

github-actions bot commented Apr 15, 2024

Does the PR have any schema changes?

codecov bot commented Apr 15, 2024 • edited

Codecov Report

rshade commented Apr 16, 2024 • edited

blampe commented Apr 18, 2024

EronWright left a comment

Choose a reason for hiding this comment

EronWright Apr 19, 2024

Choose a reason for hiding this comment

blampe Apr 29, 2024

Choose a reason for hiding this comment

EronWright Apr 19, 2024

Choose a reason for hiding this comment

blampe Apr 29, 2024

Choose a reason for hiding this comment

blampe commented May 17, 2024 • edited

blampe commented Apr 15, 2024 •

edited

codecov bot commented Apr 15, 2024 •

edited

rshade commented Apr 16, 2024 •

edited

blampe commented May 17, 2024 •

edited