-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipsec: Fix cleanup of XFRM states and policies #26072
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
After applying a backport of 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were getting occasional spikes of "no inbound state" xfrm errors (XfrmInNoStates). These lead to packet loss and brief outages for applications sending traffic to the node on which the spikes occur. I noticed that the "No node ID found for node." logline would appear at the time of these spikes and from the code this is logged when the node ID cannot be resolved. Looking a bit further the call to `DeleteIPsecEndpoint` will end up deleting the xfrm state for any state that matches the node id as derived from the mark in the state. The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP has a mark of `0xd00` which when shifted >> 16 in `getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets deleted and the kernel drops all the inbound traffic as it no longer matches a state. This commit updates that logic to skip the XFRM state and policy deletion when the node ID is zero. Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Steven Johnson <sjdot@protonmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
With commit 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") we rely on the node ID to find XFRM states and policies that belong to remote nodes, to clean them up when remote nodes are deleted. This commit makes sure that we only do this for XFRM states and policies that actually match on these node IDs. That should only be the same if the mark mask matches on node ID bits. Thus if should look like 0xffffff00 (matches on node ID, SPI, and encryption bit) or 0xffff0f00 (matches on node and encryption bit). Fixes: 9cc8a89 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
pchaigno
added
release-note/bug
This PR fixes an issue in a previous release of Cilium.
area/encryption
Impacts encryption support such as IPSec, WireGuard, or kTLS.
release-blocker/1.11
This issue will prevent the release of the next version of Cilium.
needs-backport/1.11
release-blocker/1.12
This issue will prevent the release of the next version of Cilium.
release-blocker/1.13
This issue will prevent the release of the next version of Cilium.
needs-backport/1.13
This PR / issue needs backporting to the v1.13 branch
labels
Jun 9, 2023
/test |
pchaigno
added
release-note/misc
This PR makes changes that have no direct user impact.
and removed
release-note/bug
This PR fixes an issue in a previous release of Cilium.
labels
Jun 9, 2023
qmonnet
approved these changes
Jun 9, 2023
jschwinger233
approved these changes
Jun 9, 2023
Conformance Ginkgo hit #25964 |
I performed an IPsec upgrade on EKS from v1.13.0 to this patch, just in case. I also checked for XFRM leaks by doing a scale up and down from 3 to 9 nodes and back. Connectivity tests also passed (although the CI should already cover that anyway). |
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Jun 9, 2023
pchaigno
removed
the
release-blocker/1.13
This issue will prevent the release of the next version of Cilium.
label
Jun 9, 2023
pchaigno
added
backport-pending/1.13
The backport for Cilium 1.13.x for this PR is in progress.
release-blocker/1.13
This issue will prevent the release of the next version of Cilium.
and removed
needs-backport/1.13
This PR / issue needs backporting to the v1.13 branch
labels
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.13
in 1.13.4
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.13
in 1.13.4
Jun 9, 2023
michi-covalent
added
backport-done/1.13
The backport for Cilium 1.13.x for this PR is done.
and removed
backport-pending/1.13
The backport for Cilium 1.13.x for this PR is in progress.
labels
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.13
to Backport done to v1.13
in 1.13.4
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.13
to Backport done to v1.13
in 1.13.4
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.12
in 1.12.11
Jun 12, 2023
This was referenced Jun 12, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.11
in 1.11.18
Jun 12, 2023
michi-covalent
added
backport-done/1.12
The backport for Cilium 1.12.x for this PR is done.
and removed
backport-pending/1.12
labels
Jun 12, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.12
to Backport done to v1.12
in 1.12.11
Jun 12, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.12
to Backport done to v1.12
in 1.12.11
Jun 12, 2023
qmonnet
added
backport-done/1.11
The backport for Cilium 1.11.x for this PR is done.
and removed
backport-pending/1.11
labels
Jun 14, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.11
to Backport done to v1.11
in 1.11.18
Jun 14, 2023
This was referenced Jun 14, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/encryption
Impacts encryption support such as IPSec, WireGuard, or kTLS.
backport-done/1.11
The backport for Cilium 1.11.x for this PR is done.
backport-done/1.12
The backport for Cilium 1.12.x for this PR is done.
backport-done/1.13
The backport for Cilium 1.13.x for this PR is done.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-blocker/1.11
This issue will prevent the release of the next version of Cilium.
release-blocker/1.12
This issue will prevent the release of the next version of Cilium.
release-blocker/1.13
This issue will prevent the release of the next version of Cilium.
release-note/misc
This PR makes changes that have no direct user impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See commits for details.
Fixes: #25784.