-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spot interrupt taint/label/annotation on node #6103
Comments
I am willing to work on this myself as I have experience with writing golang and kubernetes operator. It could be extra support needs to be added to upstream karpenter, but I am not sure what the best architecture would be |
Would it be enough for Karpenter to fire metrics on the nodes that were interrupted? |
Sadly for our use case it doesn't. What we currently do is when a pod is being shut down we look at the node if there is a spot interrupt going on. If we only fire metrics there is no easy way to query this interactively. Currently in the log of the pod we output if there is a spot interrupt. With metrics we would need another way to visualise it. |
What about Kubernetes events? We also fire an event here alongside the metric. I'm skeptical of wanting to change our tainting logic to support an observability use-case. What if we added a condition to the NodeClaim? Would this be enough to satisfy the observability use-case? |
Didn't notice there are kubernetes events about disruption, I could use that! Closed the PR for now, I can always open a new for the node claim condition. I will look at that later this week and make a proposal here :) |
@jonathan-innis what do you think? The new condition could look like this:
In the reason field we add why the node is interrupted: Would this new type need to be added to the upstream karpenter project? Or can we add it in the provider-aws implementation? |
Description
What problem are you trying to solve?
When a node is being shut down because of a spot interrupt I want to be able to figure that out in my pod. That way we can provide the correct information on why a pod was shut down.
Currently we use aws node termination handler, which adds different taints depending on why the node is being shut down. I would love to switch to Karpenter handling spot interrupt however this feature is blocking.
How important is this feature to you?
This feature is very important, providing this visibility to users is key for the platform we are building.
The text was updated successfully, but these errors were encountered: