Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add MachinePool workers support in ClusterClass #9016

Merged
merged 1 commit into from
Aug 25, 2023

Conversation

willie-yao
Copy link
Contributor

@willie-yao willie-yao commented Jul 19, 2023

What this PR does / why we need it:
This PR adds MachinePool workers support in ClusterClass. This is currently a WIP with the following work items still in progress:

  • Determine fields for DockerMachinePoolTemplate and implement patch
  • Template cleanup
  • Unit tests
  • E2E tests

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #5991

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 19, 2023
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 19, 2023
@willie-yao
Copy link
Contributor Author

/hold for squash

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 19, 2023
@sbueringer
Copy link
Member

@willie-yao Just ping me when I should do a first review

api/v1alpha4/conversion.go Outdated Show resolved Hide resolved
api/v1beta1/cluster_types.go Outdated Show resolved Hide resolved
api/v1beta1/clusterclass_types.go Outdated Show resolved Hide resolved
api/v1beta1/clusterclass_types.go Outdated Show resolved Hide resolved
api/v1beta1/condition_consts.go Show resolved Hide resolved
test/extension/handlers/topologymutation/handler.go Outdated Show resolved Hide resolved
test/framework/kubernetesversions/template.go Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member

sbueringer commented Aug 10, 2023

@willie-yao Really really nice work!!!

Most of my comments come down to:

  • nits
  • re-sync because we changed parts of the MD implemention while you were working on this PR

I would suggest the following:

  1. Let's fix the open findings then do another round of review
  2. Then let's only adjust the existing unit tests so they are green again
  3. Let's then consider to get some quick & easy e2e test coverage (I would take a look how we can achieve this once we're done with 1. and 2.

We have to discuss this but I think I'm in favor of getting this PR with its current scope in place and then follow-up with e2e test, unit test coverage, ...

I think the PR as is is still reviewable (at least for folks familiar with ClusterClass) and the easiest way to make progress is to get this one in as is and then follow-up with more tests. The existing tests ensure that existing ClusterClass + MD functionality doesn't break. Splitting the implementation in multiple PRs including unit tests makes everything more complicated in my opinion and also harder to tell if everything works. With unit tests I expect that this PR would be at least 3-4x the size. Also this PR mostly comes down to replicating the same logic we have for MD. I would like to merge this relatively quickly so that we don't have to continuously rebase when changes are made to the ClusterClass implementation

@CecileRobertMichon @fabriziopandini ^^ Does this sound like an acceptable way forward for you?

@willie-yao willie-yao changed the title WIP: ✨ Add MachinePool workers support in ClusterClass ✨ Add MachinePool workers support in ClusterClass Aug 10, 2023
@willie-yao
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 10, 2023
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great stuff @willie-yao

Also this PR mostly comes down to replicating the same logic we have for MD

My overall comment is that there seems to be a lot of code that was adapted directly from the MachineDeployment implementation and it seems it's already outdated (from @sbueringer's earlier comment) so I'm wondering how we can reduce duplicate code so we don't have the same logic in two places, one for MachineDeployments and one for MachinePools

Does this sound like an acceptable way forward for you?

yes although without unit tests there is even greater risk of the two diverging in the meantime and MachinePool regressing because of it...

for e2e tests for now if we add a MachinePool worker to the clusterclass quickstart test template would that be enough to at least create one in tests?

Overall +1 to not expanding scope too much in this PR and getting it merged then iterating on tests, etc.

@fabriziopandini
Copy link
Member

@CecileRobertMichon @fabriziopandini ^^ Does this sound like an acceptable way forward for you?

+1 from me
Please, start immediately tracking the next steps and pending TODO in an issue assigned to @willie-yao (unfortunately in some cases in the past we lost track of those action items, and rebuilding those lists after some time is always both time-consuming and risky, and I would like to avoid this)

@willie-yao
Copy link
Contributor Author

Thanks for all the helpful comments and feedback! Before moving forward, I wanted to make sure that my current approach is the best path forward.

Addressing @CecileRobertMichon's comment

My overall comment is that there seems to be a lot of code that was adapted directly from the MachineDeployment implementation and it seems it's already outdated (from @sbueringer's earlier comment) so I'm wondering how we can reduce duplicate code so we don't have the same logic in two places, one for MachineDeployments and one for MachinePools

I agree that this PR does contain a lot of duplicated code from MachineDeployments. This could pose a problem if someone wants to make changes to one of the topologies, they will have to duplicate the efforts for the other. Would it be worth considering refactoring this PR to use generics or to reduce the amount of repeated code somehow? If our priority is to land this feature soon, we may also consider merging this as-is and re-factoring the approach later. @sbueringer @fabriziopandini wdyt?

@willie-yao willie-yao force-pushed the cc-mp-squashed branch 2 times, most recently from fbc594a to 74ff719 Compare August 11, 2023 19:49
@CecileRobertMichon
Copy link
Contributor

To be clear, I'm +1 on moving forward with this PR as-is and refactoring as a follow-up (perhaps before even spending time writing unit tests for the new code). A single PR with the refactor and the feature would be too large for review.

@sbueringer
Copy link
Member

sbueringer commented Aug 14, 2023

yes although without unit tests there is even greater risk of the two diverging in the meantime and MachinePool regressing because of it...

I think duplicating the unit test from MD to MP won't significantly reduce the risk of MD code changing in the meantime. Getting this PR merged as soon as possible will help though :)

for e2e tests for now if we add a MachinePool worker to the clusterclass quickstart test template would that be enough to at least create one in tests?

Yup that is pretty much what I was thinking

I think we have to be careful about de-duplication as it can introduce bugs. I would suggest we explore this topic once we have the unit test coverage in place for MPs as well

exp/api/v1alpha3/conversion.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/patches/engine.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/scope/blueprint.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/scope/state.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/desired_state.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/desired_state.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/desired_state.go Outdated Show resolved Hide resolved
internal/controllers/topology/cluster/reconcile_state.go Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member

A quick round this time. Let's try to address the open comments + linter + verify + test. Then I would take another closer look

@willie-yao willie-yao force-pushed the cc-mp-squashed branch 2 times, most recently from c13efb9 to 6158492 Compare August 15, 2023 20:45
@willie-yao
Copy link
Contributor Author

/test all

@sbueringer
Copy link
Member

/test ?

@k8s-ci-robot

This comment was marked as resolved.

@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-workload-upgrade-1-28-latest-main

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit,

but as of that

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2d8512a22504156e49ba30bc4395bd238ad0fad3

@sbueringer
Copy link
Member

sbueringer commented Aug 24, 2023

Delta looks fine. I'll try to do a last full review either on Friday or Monday. But at this stage I only expect nits or follow-up tasks.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2023
@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member

Really really nice work!!

I'll open a small follow-up PR on Monday to fix some nits (just fyi)

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ad08bc42e6ec991f2bea7b2ee1f3b2913d1f0621

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2023
@k8s-ci-robot k8s-ci-robot merged commit f28fc73 into kubernetes-sigs:main Aug 25, 2023
23 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.6 milestone Aug 25, 2023
@chrischdi
Copy link
Member

Awesome work @willie-yao ! 🎉

@sbueringer
Copy link
Member

Yup, absolute great work!

@willie-yao Now we have a lot of unit tests in front of us :)

I would suggest to go over the list I added to the umbrella issue and then open PRs for maybe around 2-3 files at a time? (so that we get the unit tests quickly merged but the PRs are not too big) WDYT?

@willie-yao
Copy link
Contributor Author

I would suggest to go over the list I added to the umbrella issue and then open PRs for maybe around 2-3 files at a time? (so that we get the unit tests quickly merged but the PRs are not too big) WDYT?

Perfect! I'll get get to work right away. Should we prioritize unit tests first or the E2E test?

@sbueringer
Copy link
Member

sbueringer commented Aug 29, 2023

Unit tests for now would be good. I want to take a look at our current e2e tests first

@sbueringer
Copy link
Member

@willie-yao btw I ordered the unit tests by priority on the issue

@sbueringer
Copy link
Member

@willie-yao Another note. Please let me know on the issue on which unit tests you're working before you do it. Just so that we can potentially can get help from others if someone has time and avoid duplicate work.

@g-gaston
Copy link
Contributor

/area machinepool

@k8s-ci-robot k8s-ci-robot added the area/machinepool Issues or PRs related to machinepools label Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants