Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Post-Start hook to run custom check to determine if node is ready #126

Open
uthark opened this issue Nov 2, 2020 · 3 comments
Open

Comments

@uthark
Copy link
Contributor

uthark commented Nov 2, 2020

Is this a BUG REPORT or FEATURE REQUEST?:
FEATURE

In some setups it might be the case, when Node status is Ready, but it still fails custom checks (i.e. those from kube node problem detector.

As of now, upgrade manager proceeds when node status is reported as ready

I'd like to implement support for custom checks.
One option that I see is to extend isNodeReady check with a call to custom script that can perform other checks and it's exit status would indicate if node is ready.

Change required would be:

  1. Extend RollingUpgradeSpec to allow specify custom script (similar to PreDrain/PostDrain/PostTerminate).
  2. Call this script in WaitForDesiredInstances.

What do you think?

@shrinandj
Copy link
Collaborator

Does the node's status or some annotation/label indicate that the node was found to be in bad shape by the node problem detector? Having another script is okay, but if there is a more direct approach such as checking specific annotations/labels, it might be cleaner. WDYT?

Also, if the script fails, will it just be that that one node is skipped or will the rollingUpgrade object be marked as failed? If it's just that the node is skipped, what will the end result of the rollingUpgrade be?

Also, this script will have the time-of-check-to-time-of-use problem; i.e. the node could get into a bad state after the script executes but before the node is actually drained and terminated (this is a problem right now as well with the isNodeReady check).

@uthark
Copy link
Contributor Author

uthark commented Nov 3, 2020

Annotation / label would also work instead of running the custom check.

@eytan-avisror
Copy link
Contributor

I think custom readiness gates is a great idea, as mentioned in slack, here is my proposal for the API:

spec:
  readinessGates:
    - matchLabels:
        MyLabelKey: MyLabelValue  

We can start with this single gate of matchLabels, in the future we can add more gates as the use cases come up.

Implementation should make sure to apply this for all types of rollups (lazy / eager)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants