Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add await logic for DaemonSets #2953

Merged
merged 13 commits into from
May 17, 2024
Merged

Add await logic for DaemonSets #2953

merged 13 commits into from
May 17, 2024

Conversation

blampe
Copy link
Contributor

@blampe blampe commented Apr 15, 2024

Proposed changes

This adds await logic for DaemonSets with RollingUpdate or OnDelete update strategies.

The implementation is largely based on the existing StatefulSet logic with two high-level simplifications:

  1. We use kstatus to decide when a DaemonSet is ready.
  2. We use a PodAggregator to handle reporting pod statuses.

Importantly, unlike StatefulSet this means we do not currently inspect pods to decide readiness -- we only use them for informational purposes. I think this is sufficient but I could easily be missing something. I haven't been able to simulate situations where this logic doesn't fully capture readiness and we would need to inspect pod statuses.

A failing e2e test was added in YAML. I noticed we didn't have any YAML tests yet, and
I've found YAML to be quick and easy to crank out for very simple test cases like this.

Unit tests were added around the public Creation, Update, etc. methods in order to more fully exercise timeouts and retries. To that end I introduced a mock clock package which might be controversial. IMO Go doesn't have a great OSS mock clock but something like this can be very helpful for testing.

I'm still somewhat confused by the role of await.Read since it doesn't actually await anything, but it's implemented similar to StatefulSet as a one-shot read + readiness check.

Related issues

Fixes #609
Refs #2800
Refs #2799
Refs #2798

Copy link

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

Copy link

codecov bot commented Apr 15, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 52 lines in your changes are missing coverage. Please review.

Project coverage is 35.97%. Comparing base (9d759ca) to head (d545af1).

Files Patch % Lines
provider/pkg/await/daemonset.go 70.58% 31 Missing and 14 partials ⚠️
provider/pkg/await/watchers.go 66.66% 1 Missing and 1 partial ⚠️
provider/pkg/clients/fake/discovery.go 66.66% 1 Missing and 1 partial ⚠️
provider/pkg/retry/retry.go 60.00% 1 Missing and 1 partial ⚠️
provider/pkg/await/awaiters.go 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2953      +/-   ##
==========================================
+ Coverage   32.34%   35.97%   +3.63%     
==========================================
  Files          69       70       +1     
  Lines        8954     9156     +202     
==========================================
+ Hits         2896     3294     +398     
+ Misses       5791     5530     -261     
- Partials      267      332      +65     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rshade
Copy link
Contributor

rshade commented Apr 16, 2024

@blampe thanks for the await logic, this will be a big help. Did you happen to test the two following scenarios:

  1. Updating to a new image(Working)
  2. Updating to a new image(Broken).
    I ask because 2 specifically has caused us problems before the waiter.

@blampe
Copy link
Contributor Author

blampe commented Apr 18, 2024

@blampe thanks for the await logic, this will be a big help. Did you happen to test the two following scenarios:

  1. Updating to a new image(Working)
  2. Updating to a new image(Broken).
    I ask because 2 specifically has caused us problems before the waiter.

@rshade I added a failing e2e test here which should cover both of these scenarios.

This step in particular captures scenario (2) by updating to an invalid image. Previously the operation would succeed, but after this PR it will fail after it times out waiting for the broken pods to come up.

Let me know if there are any other test cases you have in mind!

Copy link
Contributor

@EronWright EronWright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the awaiters too well but everything seemed good to me.

Makefile Outdated Show resolved Hide resolved
Comment on lines 285 to 286
// If done=true and err is non-nil then this is an OnDelete rollout.
msg, done, err = dsa.onDeleteRolloutStatus()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty awkward handoff, and given how little code is in DaemonSetStatusViewer, maybe we shouldn't use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this over to use kstatus since it's essentially the same logic and doesn't distinguish between Rolling or OnDelete rollouts.

tests/sdk/yaml/yaml_test.go Outdated Show resolved Hide resolved
Comment on lines 3 to 6
plugins:
providers:
- name: kubernetes
path: ../../../../bin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the provider linking happened automatically but I frankly don't know how. And why don't we see it in the other steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it happens automatically in tests, although sometimes I will manually run examples in which case this helps. I took it out since I don't need to test it manually anymore.

tests/sdk/yaml/await-daemonset/step2/Pulumi.yaml Outdated Show resolved Hide resolved
@blampe blampe requested a review from EronWright April 29, 2024 23:48
blampe added a commit that referenced this pull request May 9, 2024
I noticed as part of
#2953 that the
`tests/convert` and `tests/provider` directories aren't currently
getting exercised in CI, and as a result one of them is failing on
master.

This PR moves `tests/provider` and `tests/convert` under `sdk/java`,
which is currently executed in CI but doesn't contain any tests. We also
fix the convert test along the way.

"sdk/java" is admittedly an odd namespace to put these under, but our CI
targets don't give us much flexibility at the moment, and at the end of
the day it's just an arbitrary way of sharding things.
@blampe blampe requested a review from rquitales May 17, 2024 17:40
@blampe
Copy link
Contributor Author

blampe commented May 17, 2024

Tests flaked due to pulumi/ci-mgmt#933 / pulumi/schema-tools#64.

@blampe blampe merged commit 04fb15c into master May 17, 2024
20 checks passed
@blampe blampe deleted the blampe/ds-await branch May 17, 2024 22:31
lumiere-bot bot added a commit to coolguy1771/home-ops that referenced this pull request May 24, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [@pulumi/kubernetes](https://pulumi.com)
([source](https://togithub.com/pulumi/pulumi-kubernetes)) | dependencies
| minor | [`4.11.0` ->
`4.12.0`](https://renovatebot.com/diffs/npm/@pulumi%2fkubernetes/4.11.0/4.12.0)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>pulumi/pulumi-kubernetes (@&#8203;pulumi/kubernetes)</summary>

###
[`v4.12.0`](https://togithub.com/pulumi/pulumi-kubernetes/blob/HEAD/CHANGELOG.md#4120-May-21-2024)

[Compare
Source](https://togithub.com/pulumi/pulumi-kubernetes/compare/v4.11.0...v4.12.0)

##### Added

- Added a new Helm Chart v4 resource.
([pulumi/pulumi-kubernetes#2947)
- Added support for deletion propagation policies (e.g. Orphan).
([pulumi/pulumi-kubernetes#3011)
- Server-side apply conflict errors now include the original field
manager's name.
([pulumi/pulumi-kubernetes#2983)

##### Changed

- Pulumi will now wait for DaemonSets to become ready.
([pulumi/pulumi-kubernetes#2953)
- The Release resource's merge behavior for `valueYamlFiles` now more
closely matches Helm's behavior.
([pulumi/pulumi-kubernetes#2963)

##### Fixed

- Helm Chart V3 previews no longer fail when the cluster is unreachable.
([pulumi/pulumi-kubernetes#2992)
- Fixed a panic that could occur when a missing field became `null`.
([pulumi/pulumi-kubernetes#1970)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNzEuMSIsInVwZGF0ZWRJblZlciI6IjM3LjM3MS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJ0eXBlL21pbm9yIl19-->

Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add await logic for DaemonSets
4 participants