KEP-4212: Declarative Node Maintenance #4213

atiratree · 2023-09-15T10:46:22Z

One-line PR description: introduce a NodeMaintenance API to solve various node drain issues

Issue link: Declarative Node Maintenance #4212

Other comments:

TODO:

explore permission model for NodeMaintenance, Node, Pods (RBAC, ValidatingAdmissionPolicy) and all of the actors
explore whether NodeMaintenance can be used as a replacement or superset of the Graceful Node Shutdown feature
explore possible interactions with Kubelet
explore the descheduling aspects of this feature
explore the DaemonSets and StatefulSets use case
figure out the best format/implementation for the EvacuationRequest

k8s-ci-robot · 2023-09-15T10:46:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: atiratree
Once this PR has been reviewed and has the lgtm label, please assign janetkuo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-apps/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sftim

Thank you very much for opening the issue and PR!

I know this is draft; I've got some feedback already, and I hope it's helpful.

keps/sig-apps/4212-improve-node-maintenance/README.md

keps/sig-apps/4212-improve-node-maintenance/node-maintenance.svg

keps/sig-apps/4212-improve-node-maintenance/README.md

keps/sig-apps/4212-improve-node-maintenance/kep.yaml

keps/sig-apps/4212-declarative-node-maintenance/README.md

sftim

🤔 should this be SIG Node rather than SIG Apps?

atiratree · 2023-10-13T16:12:14Z

🤔 should this be SIG Node rather than SIG Apps?

Not sure yet, the API itself is SIG Node I guess, but it has a lot of implications for SIG- Apps. Let's delay this decision after I redo the KEP and we have additional rounds of discussions.

keps/sig-apps/4212-declarative-node-maintenance/README.md

kwilczynski · 2024-05-13T11:41:14Z

keps/sig-apps/4212-declarative-node-maintenance/README.md

+termination.NodeMaintenance could then be used even by spot instances.
+
+The NodeMaintenance object could survive kubelet restarts, and the kubelet would always know if the
+node is under shutdown (maintenance). The cluster admin would have to remove the NodeMaintenance


Would the shutdown cancellation work in the same way as it currently is under Graceful Node Shutdown? I assume, yes, it would.

This KEP also supports node maintenance for Windows nodes, where graceful shutdown handling is (AIUI) not implemented.

No, it's not—very much Linux, not even other *nix, centric indeed. Would this be a blocker then for this KEP to move forward?

IMO you can mark a node as needing maintenance even if it doesn't run systemd (either Linux without systemd, or Windows which doesn't really have an init process in the POSIX sense). So this KEP can move forward whether or not your nodes gracefully handle a shutdown signal.

Cancelation by kubelet

Note:
There are cases when Node termination was cancelled by the system (or perhaps manually by an administrator). In either of those situations the Node will return to the Ready state. However, Pods which already started the process of termination will not be restored by kubelet and will need to be re-scheduled.

Is the node termination cancellation reliable? Can we use it for deletion of the NodeMaintenance?

Also the behavior is application specific, when the the NodeMaintenance is cancelled/deleted (please see the Evacuation KEP). The application/evacuator can terminate the pod or it can also recover.

keps/sig-apps/4212-declarative-node-maintenance/README.md

kwilczynski · 2024-05-13T11:47:39Z

keps/sig-apps/4212-declarative-node-maintenance/README.md

+
+If there is no connection to the apiserver (apiserver down, network issues, etc.) and the
+NodeMaintenance object cannot be created, we would fall back to the original behavior of Graceful
+Node Shutdown feature. If the connection is restored, we would stop the Graceful Node Shutdown and


It might be too late to fallback to something else. We should attempt to add NodeMaintenance object, should it be missing, but it might be too late to stop Graceful Node Shutdown, which might be too far gone, so to speak. This would be a one-way? hard? fallback.

Alternatively, perhaps then doing nothing or refusing to carry-out a shutdown would be a safer option?

That said, depends on how the shutdown was triggered: if it was due to external factors (which would be the default modus operandi for which Graceful Node Shutdown has been designed), then it would be potentially unsafe to interrupt the ongoing shutdown, or there might be not enough time, etc.

It would be too late for the pods that are already terminating. But it might be useful for the running non-terminated pods, right?

It seems to me that by default it would be good to carry out the Graceful Node Shutdown in the disconnected scenario. At least to have a softer landing in bad situations. I wonder if it would make sense to make this behavior configurable?

we would fall back to the original behavior of Graceful Node Shutdown feature

does this imply that Graceful Node Shutdown will be used the way it is now? or does it mean DNM will internally reproduce the same original GracefulNodeShutdown and just perform that behavior?

AIUI, DNM is about marking the node as needing maintenance and barely overlaps with node shutdown handling.

The only overlap is that both things trigger a node drain; but other things could also trigger that, such as node problem detector deciding the node is irreparably faulty.

keps/sig-apps/4212-declarative-node-maintenance/README.md

keps/sig-apps/4212-declarative-node-maintenance/kep.yaml

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 15, 2023

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Sep 15, 2023

k8s-ci-robot requested review from kow3ns and soltysh September 15, 2023 10:46

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 15, 2023

atiratree marked this pull request as draft September 15, 2023 10:46

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 15, 2023

atiratree mentioned this pull request Sep 15, 2023

Declarative Node Maintenance #4212

Open

4 tasks

sftim reviewed Sep 20, 2023

View reviewed changes

fabiand reviewed Sep 21, 2023

View reviewed changes

keps/sig-apps/4212-improve-node-maintenance/README.md Outdated Show resolved Hide resolved

atiratree force-pushed the improve-node-maintenance branch 7 times, most recently from 0ed835a to 81b4589 Compare September 22, 2023 18:10

sftim reviewed Sep 23, 2023

View reviewed changes

keps/sig-apps/4212-improve-node-maintenance/kep.yaml Outdated Show resolved Hide resolved

keps/sig-apps/4212-improve-node-maintenance/kep.yaml Outdated Show resolved Hide resolved

keps/sig-apps/4212-improve-node-maintenance/kep.yaml Outdated Show resolved Hide resolved

atiratree force-pushed the improve-node-maintenance branch 2 times, most recently from 6194058 to 00dc7c5 Compare September 25, 2023 14:09

atiratree changed the title ~~KEP-4212: Improve Node Maintenance~~ KEP-4212: Declarative Node Maintenance Sep 25, 2023

sftim reviewed Sep 26, 2023

View reviewed changes

keps/sig-apps/4212-declarative-node-maintenance/README.md Show resolved Hide resolved

atiratree force-pushed the improve-node-maintenance branch from 00dc7c5 to 686227c Compare September 26, 2023 15:59

sftim reviewed Oct 12, 2023

View reviewed changes

keps/sig-apps/4212-declarative-node-maintenance/README.md Outdated Show resolved Hide resolved

sftim mentioned this pull request Oct 12, 2023

Avoid scaling up in zones with operational issues aws/karpenter-provider-aws#4727

Open

sftim reviewed Oct 12, 2023

View reviewed changes