Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request: flag to lower 15 instance type requirement for spot-to-spot consolidation? #1202

Open
drawnwren opened this issue Apr 22, 2024 · 3 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@drawnwren
Copy link

Description

What problem are you trying to solve?
I'm running karpenter with gpu workloads and aws often doesn't even offer 15 instance types with the gpu that a workload requires. I'd still like to try spotToSpot with these workloads because of the huge cost savings associated with spot gpu instances currently. Would possibly configuring the 15 instance type limit make sense in this case?

@drawnwren drawnwren added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 22, 2024
@drawnwren drawnwren changed the title request: flag to lower 15 instance type requirement for spottospot consolidation? request: flag to lower 15 instance type requirement for spot-to-spot consolidation? Apr 22, 2024
@jonathan-innis
Copy link
Member

Would possibly configuring the 15 instance type limit make sense in this case

I definitely think that we'd be open to make this configurable. What value would you like to set it to? The biggest reason that we went with 15 is -- we did some analysis work and found that this is around the right number of instances to ensure that you have enough flexibility in your launch request that you won't get an immediate spot interruption after the launch. This is obviously more of a heuristic, but you're definitely in uncharted waters in terms of your propensity to potentially get re-interrupted on a consolidation if you try to lower this number

@jonathan-innis
Copy link
Member

@drawnwren Have you considered forking this (just for PoC/testing) and see what kind of performance you get?

@jonathan-innis
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants