Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scale out latency #6165

Open
mingmasplace opened this issue May 7, 2024 · 3 comments
Open

scale out latency #6165

mingmasplace opened this issue May 7, 2024 · 3 comments
Assignees
Labels
lifecycle/stale question Further information is requested

Comments

@mingmasplace
Copy link

Hi, any benchmark on how fast Karpenter scales out compared to aws lambda? This will depend on the number of nodes and node types. Per https://aws.amazon.com/blogs/containers/eliminate-kubernetes-node-scaling-lag-with-pod-priority-and-over-provisioning/, it will take 1-2 minutes to add new nodes. Per my test, EC2 API can provision new running EC2 in just seconds (status check will still take couple minutes to change to checks passed, but Karpenter doesn't wait for that).

@jonathan-innis jonathan-innis added the question Further information is requested label May 9, 2024
@jonathan-innis
Copy link
Contributor

We typically see new nodes come up in about 30-40s in a ready state. A lot of this depends on what instance type you are launching, what you are doing in the startup in your image, and what's in your userData. We don't publish public benchmarks on this specifically for this reason -- there's a lot of variation among images and instance types.

If you were interested in getting precise data to see where latency is introduced, take a look at: https://github.com/awslabs/node-latency-for-k8s

@jonathan-innis jonathan-innis self-assigned this May 9, 2024
@danielloader
Copy link

It's also worth noting that you can scale pretty damn quickly just in bigger steps - if you bump the replica count of a deployment with 1 CPU/1Gi RAM to 100 replicas, you'll get some large nodes deployed, so yes it'll take maybe 60s to get them and deployed but you can serve a lot of traffic in that step up.

We run services that can handle thousands of requests a second per pod, so it's a different scaling metric completely to lambda's one request = one execution model.

Copy link
Contributor

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants