Spot interrupt taint/label/annotation on node #6103

stijndehaes · 2024-04-26T13:20:42Z

Description

What problem are you trying to solve?

When a node is being shut down because of a spot interrupt I want to be able to figure that out in my pod. That way we can provide the correct information on why a pod was shut down.
Currently we use aws node termination handler, which adds different taints depending on why the node is being shut down. I would love to switch to Karpenter handling spot interrupt however this feature is blocking.

How important is this feature to you?

This feature is very important, providing this visibility to users is key for the platform we are building.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

stijndehaes · 2024-04-26T13:21:27Z

I am willing to work on this myself as I have experience with writing golang and kubernetes operator. It could be extra support needs to be added to upstream karpenter, but I am not sure what the best architecture would be

engedaam · 2024-05-03T19:48:31Z

Would it be enough for Karpenter to fire metrics on the nodes that were interrupted?

stijndehaes · 2024-05-06T12:20:44Z

Would it be enough for Karpenter to fire metrics on the nodes that were interrupted?

Sadly for our use case it doesn't. What we currently do is when a pod is being shut down we look at the node if there is a spot interrupt going on. If we only fire metrics there is no easy way to query this interactively. Currently in the log of the pod we output if there is a spot interrupt. With metrics we would need another way to visualise it.

jonathan-innis · 2024-05-10T02:47:43Z

What we currently do is when a pod is being shut down we look at the node if there is a spot interrupt going on

What about Kubernetes events? We also fire an event here alongside the metric. I'm skeptical of wanting to change our tainting logic to support an observability use-case. What if we added a condition to the NodeClaim? Would this be enough to satisfy the observability use-case?

stijndehaes · 2024-05-13T07:06:29Z

What about Kubernetes events? We also fire an event here alongside the metric. I'm skeptical of wanting to change our tainting logic to support an observability use-case. What if we added a condition to the NodeClaim? Would this be enough to satisfy the observability use-case?

Didn't notice there are kubernetes events about disruption, I could use that!
A condition in the node claim would be better, but I will see where I can get with the events to start with.

Closed the PR for now, I can always open a new for the node claim condition. I will look at that later this week and make a proposal here :)

stijndehaes · 2024-05-14T12:01:33Z

@jonathan-innis what do you think?

The new condition could look like this:

conditions:
- lastTransitionTime: "2024-05-10T00:05:07Z"
   status: "True"
   type: Interrupted
   Reason: "SpotInterrupt"

In the reason field we add why the node is interrupted: SpotInterrupt, ScheduledChange, ....
The type could just be Interrupted.

Would this new type need to be added to the upstream karpenter project? Or can we add it in the provider-aws implementation?

stijndehaes added feature New feature or request needs-triage Issues that need to be triaged labels Apr 26, 2024

stijndehaes mentioned this issue May 3, 2024

feat: Added an interruption taint to the nodes #6143

Closed

3 tasks

engedaam removed the needs-triage Issues that need to be triaged label May 3, 2024

jonathan-innis self-assigned this May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spot interrupt taint/label/annotation on node #6103

Spot interrupt taint/label/annotation on node #6103

stijndehaes commented Apr 26, 2024

stijndehaes commented Apr 26, 2024

engedaam commented May 3, 2024

stijndehaes commented May 6, 2024

jonathan-innis commented May 10, 2024

stijndehaes commented May 13, 2024

stijndehaes commented May 14, 2024

Spot interrupt taint/label/annotation on node #6103

Spot interrupt taint/label/annotation on node #6103

Comments

stijndehaes commented Apr 26, 2024

Description

stijndehaes commented Apr 26, 2024

engedaam commented May 3, 2024

stijndehaes commented May 6, 2024

jonathan-innis commented May 10, 2024

stijndehaes commented May 13, 2024

stijndehaes commented May 14, 2024