Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipsec: Fix cleanup of XFRM states and policies #26072

Merged
merged 2 commits into from
Jun 9, 2023

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Jun 9, 2023

See commits for details.

Fixes: #25784.

sjdot and others added 2 commits June 9, 2023 12:26
After applying a backport of 9cc8a89 ("ipsec: Fix leak of XFRM
policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were
getting occasional spikes of "no inbound state" xfrm errors
(XfrmInNoStates). These lead to packet loss and brief outages for
applications sending traffic to the node on which the spikes occur.

I noticed that the "No node ID found for node." logline would appear at
the time of these spikes and from the code this is logged when the node
ID cannot be resolved. Looking a bit further the call to
`DeleteIPsecEndpoint` will end up deleting the xfrm state for any state
that matches the node id as derived from the mark in the state.

The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP
has a mark of `0xd00` which when shifted >> 16 in
`getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets
deleted and the kernel drops all the inbound traffic as it no longer
matches a state.

This commit updates that logic to skip the XFRM state and policy
deletion when the node ID is zero.

Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs")
Signed-off-by: Steven Johnson <sjdot@protonmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
With commit 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and
Azure IPAMs") we rely on the node ID to find XFRM states and policies
that belong to remote nodes, to clean them up when remote nodes are
deleted.

This commit makes sure that we only do this for XFRM states and policies
that actually match on these node IDs. That should only be the same if
the mark mask matches on node ID bits. Thus if should look like
0xffffff00 (matches on node ID, SPI, and encryption bit) or 0xffff0f00
(matches on node and encryption bit).

Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno pchaigno added release-note/bug This PR fixes an issue in a previous release of Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. release-blocker/1.11 This issue will prevent the release of the next version of Cilium. needs-backport/1.11 release-blocker/1.12 This issue will prevent the release of the next version of Cilium. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.11 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.4 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.11.18 Jun 9, 2023
@pchaigno
Copy link
Member Author

pchaigno commented Jun 9, 2023

/test

@pchaigno pchaigno marked this pull request as ready for review June 9, 2023 10:37
@pchaigno pchaigno requested review from a team as code owners June 9, 2023 10:37
@pchaigno pchaigno added release-note/misc This PR makes changes that have no direct user impact. and removed release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 9, 2023
@qmonnet
Copy link
Member

qmonnet commented Jun 9, 2023

Conformance Ginkgo hit #25964

@pchaigno
Copy link
Member Author

pchaigno commented Jun 9, 2023

I performed an IPsec upgrade on EKS from v1.13.0 to this patch, just in case. I also checked for XFRM leaks by doing a scale up and down from 3 to 9 nodes and back. Connectivity tests also passed (although the CI should already cover that anyway).

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 9, 2023
@pchaigno pchaigno merged commit 57eac9d into cilium:main Jun 9, 2023
64 of 65 checks passed
@pchaigno pchaigno mentioned this pull request Jun 9, 2023
5 tasks
@pchaigno pchaigno deleted the fix-xfrm-cleanup branch June 9, 2023 14:21
@pchaigno pchaigno removed the release-blocker/1.13 This issue will prevent the release of the next version of Cilium. label Jun 9, 2023
@pchaigno pchaigno added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.4 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.4 Jun 9, 2023
@michi-covalent michi-covalent added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.4 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.4 Jun 9, 2023
@gandro gandro mentioned this pull request Jun 12, 2023
2 tasks
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.11 Jun 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.18 Jun 12, 2023
@michi-covalent michi-covalent added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Jun 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.11 Jun 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.11 Jun 12, 2023
@qmonnet qmonnet added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jun 14, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.18 Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.11 This issue will prevent the release of the next version of Cilium. release-blocker/1.12 This issue will prevent the release of the next version of Cilium. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.11.18
Backport done to v1.11
1.12.11
Backport done to v1.12
1.13.4
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

6 participants