-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4212: Declarative Node Maintenance #4213
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: atiratree The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for opening the issue and PR!
I know this is draft; I've got some feedback already, and I hope it's helpful.
keps/sig-apps/4212-improve-node-maintenance/node-maintenance.svg
Outdated
Show resolved
Hide resolved
0ed835a
to
81b4589
Compare
6194058
to
00dc7c5
Compare
00dc7c5
to
686227c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 should this be SIG Node rather than SIG Apps?
Not sure yet, the API itself is SIG Node I guess, but it has a lot of implications for SIG- Apps. Let's delay this decision after I redo the KEP and we have additional rounds of discussions. |
termination.NodeMaintenance could then be used even by spot instances. | ||
|
||
The NodeMaintenance object could survive kubelet restarts, and the kubelet would always know if the | ||
node is under shutdown (maintenance). The cluster admin would have to remove the NodeMaintenance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the shutdown cancellation work in the same way as it currently is under Graceful Node Shutdown? I assume, yes, it would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This KEP also supports node maintenance for Windows nodes, where graceful shutdown handling is (AIUI) not implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not—very much Linux, not even other *nix, centric indeed. Would this be a blocker then for this KEP to move forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO you can mark a node as needing maintenance even if it doesn't run systemd (either Linux without systemd, or Windows which doesn't really have an init process in the POSIX sense). So this KEP can move forward whether or not your nodes gracefully handle a shutdown signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note:
There are cases when Node termination was cancelled by the system (or perhaps manually by an administrator). In either of those situations the Node will return to the Ready state. However, Pods which already started the process of termination will not be restored by kubelet and will need to be re-scheduled.
Is the node termination cancellation reliable? Can we use it for deletion of the NodeMaintenance?
Also the behavior is application specific, when the the NodeMaintenance is cancelled/deleted (please see the Evacuation KEP). The application/evacuator can terminate the pod or it can also recover.
|
||
If there is no connection to the apiserver (apiserver down, network issues, etc.) and the | ||
NodeMaintenance object cannot be created, we would fall back to the original behavior of Graceful | ||
Node Shutdown feature. If the connection is restored, we would stop the Graceful Node Shutdown and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be too late to fallback to something else. We should attempt to add NodeMaintenance object, should it be missing, but it might be too late to stop Graceful Node Shutdown, which might be too far gone, so to speak. This would be a one-way? hard? fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, perhaps then doing nothing or refusing to carry-out a shutdown would be a safer option?
That said, depends on how the shutdown was triggered: if it was due to external factors (which would be the default modus operandi for which Graceful Node Shutdown has been designed), then it would be potentially unsafe to interrupt the ongoing shutdown, or there might be not enough time, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be too late for the pods that are already terminating. But it might be useful for the running non-terminated pods, right?
It seems to me that by default it would be good to carry out the Graceful Node Shutdown in the disconnected scenario. At least to have a softer landing in bad situations. I wonder if it would make sense to make this behavior configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would fall back to the original behavior of Graceful Node Shutdown feature
does this imply that Graceful Node Shutdown will be used the way it is now? or does it mean DNM will internally reproduce the same original GracefulNodeShutdown and just perform that behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUI, DNM is about marking the node as needing maintenance and barely overlaps with node shutdown handling.
The only overlap is that both things trigger a node drain; but other things could also trigger that, such as node problem detector deciding the node is irreparably faulty.
a4e07e6
to
2ca8ef9
Compare
e733a22
to
54ef958
Compare
54ef958
to
a15b78d
Compare
TODO: