-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce resource requests of istio ingress gateway and adapt autoscaling accordingly. #9250
Reduce resource requests of istio ingress gateway and adapt autoscaling accordingly. #9250
Conversation
…ng accordingly. Due to the split of istio ingress gateways across zones in highly available seed setup, its resource requests are slightly oversized except for very large active seed clusters. To reduce the unnecessary resource waste this change reduces the resource requests to a quarter (cpu) and half (memory) respectively. The general assumption is that due to its priority the istio ingress gateway may be able to get additional cpu if available on the node. With regards to memory, the limit is left in place with the same value and again its priority may help not being out-of-memory killed. As an additional measure with regards to memory, the autoscaling is extended to also cover memory so that a scale-up can happen under memory pressure. In addition to that, the scale-up/-down behaviour is now explicitly specified with a fast scale-up and a slow scale-down.
Thank you @ScheererJ. I hope we will not break/tear down anything with the change ("never touch a running system", but in this case...). What can we do - as flanking operations? I could get the numbers from dev and staging? Maybe you can ping me? Something else? I should have also the restart count and such, but what matters is whether the API servers remained accessible and I do not know whether we would see a small decline in availability (a large probably). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Did you take a look at the historic data, is one new pod per minute good enough for upscaling?
LGTM label has been added. Git tree hash: 2b40a1e4056b42b027a006de39d3e9052ce31c2c
|
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rfranzke The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
How to categorize this PR?
/area networking
/area auto-scaling
/area cost
/kind enhancement
What this PR does / why we need it:
Reduce resource requests of istio ingress gateway and adapt autoscaling accordingly.
Due to the split of istio ingress gateways across zones in highly available seed setup, its resource requests are slightly oversized except for very large active seed clusters. To reduce the unnecessary resource waste this change reduces the resource requests to a quarter (cpu) and half (memory) respectively. The general assumption is that due to its priority the istio ingress gateway may be able to get additional cpu if available on the node. With regards to memory, the limit is left in place with the same value and again its priority may help not being out-of-memory killed. As an additional measure with regards to memory, the autoscaling is extended to also cover memory so that a scale-up can happen under memory pressure. In addition to that, the scale-up/-down behaviour is now explicitly specified with a fast scale-up and a slow scale-down.
Which issue(s) this PR fixes:
None.
Special notes for your reviewer:
For the istio ingress gateway spanning multiple zones, i.e. the default istio ingress gateway, the autoscaling is not optimal as it does not take zones into account. This means the deployment may scale up in a zone, which is not under pressure.
However, as we have not seen a lot of scale-up operations with regards to istio, this is left as a follow-up step. It might be a good idea to combine the single-zone istio ingress gateways to a virtual multi-zonal one and getting rid of the existing default one, but that requires more changes and may be done as a follow-up.
Release note: