Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce resource requests of istio ingress gateway and adapt autoscaling accordingly. #9250

Conversation

ScheererJ
Copy link
Contributor

How to categorize this PR?

/area networking
/area auto-scaling
/area cost
/kind enhancement

What this PR does / why we need it:
Reduce resource requests of istio ingress gateway and adapt autoscaling accordingly.

Due to the split of istio ingress gateways across zones in highly available seed setup, its resource requests are slightly oversized except for very large active seed clusters. To reduce the unnecessary resource waste this change reduces the resource requests to a quarter (cpu) and half (memory) respectively. The general assumption is that due to its priority the istio ingress gateway may be able to get additional cpu if available on the node. With regards to memory, the limit is left in place with the same value and again its priority may help not being out-of-memory killed. As an additional measure with regards to memory, the autoscaling is extended to also cover memory so that a scale-up can happen under memory pressure. In addition to that, the scale-up/-down behaviour is now explicitly specified with a fast scale-up and a slow scale-down.

Which issue(s) this PR fixes:
None.

Special notes for your reviewer:
For the istio ingress gateway spanning multiple zones, i.e. the default istio ingress gateway, the autoscaling is not optimal as it does not take zones into account. This means the deployment may scale up in a zone, which is not under pressure.
However, as we have not seen a lot of scale-up operations with regards to istio, this is left as a follow-up step. It might be a good idea to combine the single-zone istio ingress gateways to a virtual multi-zonal one and getting rid of the existing default one, but that requires more changes and may be done as a follow-up.

Release note:

Resource requests of istio ingress gateway are reduced and its horizontal autoscaling behaviour specified in more detail, including scale-up under memory pressure

…ng accordingly.

Due to the split of istio ingress gateways across zones in highly available seed setup,
its resource requests are slightly oversized except for very large active seed clusters.
To reduce the unnecessary resource waste this change reduces the resource requests to a
quarter (cpu) and half (memory) respectively. The general assumption is that due to its
priority the istio ingress gateway may be able to get additional cpu if available on
the node. With regards to memory, the limit is left in place with the same value and
again its priority may help not being out-of-memory killed.
As an additional measure with regards to memory, the autoscaling is extended to also
cover memory so that a scale-up can happen under memory pressure.
In addition to that, the scale-up/-down behaviour is now explicitly specified with a
fast scale-up and a slow scale-down.
@gardener-prow gardener-prow bot added area/networking Networking related area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related area/cost Cost related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Feb 26, 2024
@gardener-prow gardener-prow bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 26, 2024
@ScheererJ
Copy link
Contributor Author

@vlerenc
Copy link
Member

vlerenc commented Feb 26, 2024

Thank you @ScheererJ. I hope we will not break/tear down anything with the change ("never touch a running system", but in this case...). What can we do - as flanking operations? I could get the numbers from dev and staging? Maybe you can ping me?

Something else? I should have also the restart count and such, but what matters is whether the API servers remained accessible and I do not know whether we would see a small decline in availability (a large probably).

@gardener-prow gardener-prow bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 26, 2024
Copy link
Member

@DockToFuture DockToFuture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Did you take a look at the historic data, is one new pod per minute good enough for upscaling?

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2024
Copy link
Contributor

gardener-prow bot commented Feb 26, 2024

LGTM label has been added.

Git tree hash: 2b40a1e4056b42b027a006de39d3e9052ce31c2c

@axel7born
Copy link
Contributor

/lgtm

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Copy link
Contributor

gardener-prow bot commented Feb 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rfranzke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2024
@gardener-prow gardener-prow bot merged commit f930e8e into gardener:master Feb 28, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related area/cost Cost related area/networking Networking related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants