Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service-cache: fix memory leak and missed updates on scale to zero events in rare circumstances #24619

Merged
merged 2 commits into from
Mar 30, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Mar 29, 2023

This PR fixes two issues which could affect the endpoints handling in the service cache, concerning remote services:

  • Prevents that an update could be missed in case of scale to zero events in very specific scenarios.
  • Ensures the deletion of a map entry which was previously leaked;

Please, refer to the respective commit messages for more details.

Fix a memory leak in the service cache, and possible missed service updates on scale to zero events in rare circumstances

Currently, a service is reported as ready if the local Endpoint resource
has been found, or it has at least one endpoint in remote clusters. This
commit changes slightly the logic, reporting the service ready also if any
remote service has been found (even though with 0 endpoints), to prevent
that an update is possibly missed on scale to zero events.

In particular, the issue can be triggered in case the local service has
no selector (hence k8s creates neither an Endpoint nor an EndpointSlice
object), while the remote one is standard. When the deployment targeted
by the remote cluster is scaled to 0, the service entry in the local
cluster is not correctly cleared.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Previously, upon detecting the deletion of a global service in a remote
cluster, we removed the corresponding external endpoints. Still, we did
not delete the map associated with that service when no remote endpoints
were left. This commit fixes this, and also ensures that the service
entry is deleted if no longer ready.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch affects/v1.13 This issue affects v1.13 branch labels Mar 29, 2023
@giorio94 giorio94 requested a review from a team as a code owner March 29, 2023 06:30
@giorio94 giorio94 requested a review from youngnick March 29, 2023 06:30
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.13.2 Mar 29, 2023
@giorio94 giorio94 changed the title service-cache: fix memory leak and missed updates on scale to zero events service-cache: fix memory leak and missed updates on scale to zero events in rare circumstances Mar 29, 2023
@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

ConformanceK8sKind hit new flake #24622

@giorio94
Copy link
Member Author

giorio94 commented Mar 29, 2023

/test-1.24-5.4

Hit #15455

@giorio94
Copy link
Member Author

giorio94 commented Mar 29, 2023

/test-1.26-net-next

Hit #24573

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 30, 2023
@julianwiedmann
Copy link
Member

Fixes: #issue-number

Note that if this was meant to close a related issue, you will need to close it manually :)

@julianwiedmann julianwiedmann merged commit b7d58c1 into cilium:master Mar 30, 2023
43 checks passed
@giorio94
Copy link
Member Author

Fixes: #issue-number

Note that if this was meant to close a related issue, you will need to close it manually :)

Nope, I just forgot it there from the template.

@jibi jibi mentioned this pull request Apr 3, 2023
11 tasks
@jibi jibi added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 3, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.13 in 1.13.2 Apr 3, 2023
@jibi jibi added the backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. label Apr 7, 2023
@jibi jibi removed the backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. label Apr 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.2 Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.13.2
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

6 participants