-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpener unable to create spot instances in a local zone #6183
Comments
Do you see different behavior for on-demand instance types? Is this issue just for spot or for all instance types? |
yeah if I change this exact config to use on-demand everything gets scheduled like normal. But as soon as it's switched to spot it fails. |
Are you able to launch this instance in non-local zones? Based on the metric it looks like we believe the instance type is unavailable across all zones, not just the local zone. Do you have a full set of Karpenter logs? |
yes, if I include a non-local zone AZ both on-demand and spots work for the regular AZ. local-zone continues to fail for spot instances. (please don't mind the closed spot instance. It was in the running state, I was just too late grabbing a screenshot from the spot requests page) |
It does look like we assume if we can discover spot pricing within a region, we can discover pricing across all zones. If a zone is missing from the pricing information, offerings for that zone are marked as unavailable. It's probably reasonable to continue to fallback to the on-demand pricing if pricing for a specific zone is unavailable, like we would for the region if it were unavailable. I'm still a bit confused about your metric values though, Karpenter believed there were no offerings available in any zone for |
yes I noticed that when I allowed it to be scheduled in the other AZ it changed to a 1. I attached the entire metrics list, hopefully that also helps. |
After taking another look at your logs, I realized the reason the metrics showed the instance as unavailable was because you're only selecting subnets from that zone. I had assumed you were only using the nodepool to filter down zone. That clears up my point of confusion, and the metrics you provided look right to me. I guess the question at this point is what are the implications of falling back to on-demand pricing in a single zone. The only thing I can think of is that we may not consolidate when we should have been able to, but the same can be said for full region fallback. I'm going to try and think if there's anything else, but that seems like a reasonable path forward at the moment. |
Bit confused with this one, do you mean as a fallback showing that on-demand pricing? The implication I could see is that if you define a cheap t3 instance and an expensive g4dn type then that might cause issues. I noticed that the g4dn has ~77% cost savings and would be cheaper than that t3 one. For my use case this would not be a problem, but for others it could be. Could another option be to create an average from all other valid discovered zones? |
@jmdeal Have you been able to get any progress with this yet? |
Description
When trying to use a spot instance in a local zone Karpenter fails to launch one even if there is capacity for it. As an example I am using the t3.medium in the eu-north-1-cph-1a local zone.
Observed Behavior:
{"level":"ERROR","time":"2024-05-11T19:42:18.235Z","logger":"controller.provisioner","message":"Could not schedule pod, incompatible with nodepool \"eks-cluster-prefect\", daemonset overhead={\"cpu\":\"210m\",\"memory\":\"240Mi\",\"pods\":\"6\"}, no instance type satisfied resources {\"cpu\":\"210m\",\"memory\":\"240Mi\",\"pods\":\"7\"} and requirements karpenter.sh/capacity-type In [spot], karpenter.sh/nodepool In [eks-cluster-prefect], node.kubernetes.io/instance-type In [t3.medium], topology.kubernetes.io/zone In [eu-north-1-cph-1a], worker/type In [prefect] (no instance type met the scheduling requirements or had a required offering); incompatible with nodepool \"eks-cluster-default\", daemonset overhead={\"cpu\":\"210m\",\"memory\":\"240Mi\",\"pods\":\"7\"}, incompatible requirements, key worker/type, worker/type In [prefect] not in worker/type In [worker]","commit":"a70b39e","pod":"prefect/inquisitive-carp-mwbgm-vtxj7"}
Karpenter is able to discover the spot instances:
karpenter_cloudprovider_instance_type_offering_available{capacity_type="spot",instance_type="t3.medium",zone="eu-north-1-cph-1a"} 0 karpenter_cloudprovider_instance_type_offering_available{capacity_type="spot",instance_type="t3.medium",zone="eu-north-1a"} 0 karpenter_cloudprovider_instance_type_offering_available{capacity_type="spot",instance_type="t3.medium",zone="eu-north-1b"} 0 karpenter_cloudprovider_instance_type_offering_available{capacity_type="spot",instance_type="t3.medium",zone="eu-north-1c"} 0
AWS Is offering this instance type but is not publishing the price history
Expected Behavior:
Have a spot instance scheduled, or get the warning that no capacity is available.
Reproduction Steps (Please include YAML):
NodeClass is basically the default one.
Versions:
kubectl version
):Client Version: v1.28.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.3-eks-adc7111
The text was updated successfully, but these errors were encountered: