Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement maxDelay option for api server rate limiting #395

Open
dougbtv opened this issue Dec 7, 2023 · 4 comments
Open

Implement maxDelay option for api server rate limiting #395

dougbtv opened this issue Dec 7, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@dougbtv
Copy link
Member

dougbtv commented Dec 7, 2023

See: https://danielmangum.com/posts/controller-runtime-client-go-rate-limiting/#the-default-controller-rate-limiter

This could prove to improve how we're rate limited by the api server at scale.

@dougbtv dougbtv added the enhancement New feature or request label Dec 7, 2023
@mrbojangles3
Copy link

mrbojangles3 commented Dec 11, 2023

One of the possible symptoms of not having supplying your own timeout for a request is below, note the time and durations

Warning  FailedCreatePodSandBox  3m13s (x13677 over 3d17h)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_samplepod-bridge-26tpw_default_17cdc8d1-8fd7-4e92-8740-5bc89f7ec65f_0(34f52fc40f083cd7ddf77bf04005c0a6f815e967b073f2125441d62dc37a8c33): error adding pod default_samplepod-bridge-26tpw to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:34f52fc40f083cd7ddf77bf04005c0a6f815e967b073f2125441d62dc37a8c33 Netns:/var/run/netns/ce3f83ba-b99e-41bc-ad37-5925193be0c3 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=samplepod-bridge-26tpw;K8S_POD_INFRA_CONTAINER_ID=34f52fc40f083cd7ddf77bf04005c0a6f815e967b073f2125441d62dc37a8c33;K8S_POD_UID=17cdc8d1-8fd7-4e92-8740-5bc89f7ec65f Path:
<snip>
ContainerID:"34f52fc40f083cd7ddf77bf04005c0a6f815e967b073f2125441d62dc37a8c33" Netns:"/var/run/netns/ce3f83ba-b99e-41bc-ad37-5925193be0c3" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=samplepod-bridge-26tpw;K8S_POD_INFRA_CONTAINER_ID=34f52fc40f083cd7ddf77bf04005c0a6f815e967b073f2125441d62dc37a8c33;K8S_POD_UID=17cdc8d1-8fd7-4e92-8740-5bc89f7ec65f" Path:"" ERRORED: error configuring pod [default/samplepod-bridge-26tpw] networking: [default/samplepod-bridge-26tpw/17cdc8d1-8fd7-4e92-8740-5bc89f7ec65f:bridge-whereabouts-10-2]: error adding container to network "bridge-whereabouts-10-2": error at storage engine: k8s get error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

@mrbojangles3
Copy link

mrbojangles3 commented Jan 31, 2024

Oddly the lack of rate limiting leads to overall slower performance. I tried to launch 2500 pods with whereabouts ip addresses.

QPS / Burst P99 Latency Backoff encountered and total runtime
20/20 699 sec Yes, 20 min
10/10 509 sec Yes, 16 min
5/5 483 sec Yes, 20 min
1/1 3 seconds No, 45 min

@mrbojangles3
Copy link

mrbojangles3 commented Feb 2, 2024

An update to the above table is coming Updated on Fed 8. Assuming the maxDelay option is the cause, which I am not sure it is. The issue of the slowness might need to be addressed as a bug and not an enhancement. Depending on the amount of pods needing to come back.

@Zorlin
Copy link

Zorlin commented Mar 27, 2024

Right now this issue means this project is extremely limiting for our cluster... we need to spawn in about 10,000 pods across 60 or so k8s workers (currently prototyping with 4000 pods across 24) and the spawn rate is about 0.5 to 1 pods per second. A fix for this issue would be fantastic for our use case! I'm trying to figure out how to implement it myself :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants