k8s-spot-rescheduler doesn't trigger cluster autoscale up if no spot instances available #53

morganwalker · 2019-01-17T14:03:56Z

We're using kops 1.10.0 and k8s 1.10.11. We're using two separate instance groups (IG), nodes (on-demand) and spots (spot), both spread across 3 availability zones. I've applied the appropriate nodeLabels and have defined the following in my k8s-spot-rescheduler deployment manifest:

- --on-demand-node-label=on-demand
- --spot-node-label=spot

The nodes IG has the spot=false:PreferNoSchedule taint so the spots IG is preferred. I'm using the cluster autoscaler to autodiscover both IGs via the --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/kubernetes.metis.wtf and these tags exist on both IGs. I've confirmed that pods on most nodes nodes are able to be drained and moved to spots nodes. With an exception:

The spots IG was set to minSize: 1 and maxSize=3 and we had one spots node up and running in us-east-1c
k8s-spot-rescheduler attempted to drain the pods on a nodes node but failed with

I0117 02:16:49.099271       1 rescheduler.go:288] Considering ip-172-20-127-232.ec2.internal for removal
I0117 02:16:49.099797       1 rescheduler.go:293] Cannot drain node: pod metis-internal/rabbitmq-0 can't be rescheduled on any existing spot node

metis-internal/rabbitmq-0 is a statefulSet with a PVC
the PVC resides in us-east-1a so it makes sense why it couldn't be scheduled on the spots node

Why didn't the failure to schedule metis-internal/rabbitmq-0 trigger the cluster autoscaler to try to provision a new spots node until it created one in the same availability zone? I'm wondering if k8s-spot-rescheduler would have actually evicted the pod, the cluster autoscaler would have noticed that a pod needed to be scheduled and would have spun up a new node in the spots IG.

The text was updated successfully, but these errors were encountered:

obellagamba · 2019-05-31T15:04:54Z

Any news on this front?

If you guys have a strategy, I'm more than willing to help with the implementation of this feature, as it seems important for us.

Antony450 · 2019-11-30T19:05:31Z

Taint can be add to on-demand instance group other than spot-instance IG like below.
labels = "kubernetes.io/role=common,lifecycle=OnDemand"
taints = "lifecycle=OnDemand:PreferNoSchedule"
This works for me.

CharlieC3 · 2020-01-23T22:33:00Z

In my experience the taint just tells the K8s scheduler to try scheduling any unscheduled pods onto an existing spot instance node, and it doesn't tell the cluster autoscaler to scale up on spot instances to make room if there aren't any spot instances available.

yogeshkk · 2020-04-02T05:28:05Z

Hi,

I having the same issue so I was thinking creating automation which will see if there is an on-demand node is up in the environment and if yes I will add a few spot node so k8s-spot-rescheduler can move the pod to this spot and we will get rid of the on-demand node.

We can implement similer in k8s-spot-rescheduler. Was thinking we can have a parameter which will take the name of spot IG or ASG and if we don't have spot capacity we will scale that IG or ASG(can use CA's code for scaling).

morganwalker mentioned this issue Jan 17, 2019

k8s-spot-rescheduler doesn't seem to iterate through on-demand nodes and keeps trying to drain undrainable nodes #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s-spot-rescheduler doesn't trigger cluster autoscale up if no spot instances available #53

k8s-spot-rescheduler doesn't trigger cluster autoscale up if no spot instances available #53

morganwalker commented Jan 17, 2019 •

edited

obellagamba commented May 31, 2019

Antony450 commented Nov 30, 2019

CharlieC3 commented Jan 23, 2020

yogeshkk commented Apr 2, 2020 •

edited

k8s-spot-rescheduler doesn't trigger cluster autoscale up if no spot instances available #53

k8s-spot-rescheduler doesn't trigger cluster autoscale up if no spot instances available #53

Comments

morganwalker commented Jan 17, 2019 • edited

obellagamba commented May 31, 2019

Antony450 commented Nov 30, 2019

CharlieC3 commented Jan 23, 2020

yogeshkk commented Apr 2, 2020 • edited

morganwalker commented Jan 17, 2019 •

edited

yogeshkk commented Apr 2, 2020 •

edited