-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipsec: Clean up stale XFRM policies and states #24773
Merged
joestringer
merged 4 commits into
cilium:master
from
pchaigno:fix-xfrm-config-replacement
Apr 11, 2023
Merged
ipsec: Clean up stale XFRM policies and states #24773
joestringer
merged 4 commits into
cilium:master
from
pchaigno:fix-xfrm-config-replacement
Apr 11, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pchaigno
added
kind/bug
This is a bug in the Cilium logic.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
area/encryption
Impacts encryption support such as IPSec, WireGuard, or kTLS.
needs-backport/1.11
needs-backport/1.13
This PR / issue needs backporting to the v1.13 branch
labels
Apr 5, 2023
pchaigno
force-pushed
the
fix-xfrm-config-replacement
branch
from
April 6, 2023 10:47
fe461b9
to
21c438e
Compare
These wildcard variables will be used by a later commit in the IPsec logic. Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno
force-pushed
the
fix-xfrm-config-replacement
branch
from
April 7, 2023 12:40
21c438e
to
b8f97c1
Compare
UpsertIPsecEndpoint is currently unable to replace stale XFRM states. We use XfrmStateAdd, which fails with EEXIST if a state with the same key (IPs, SPI, and mark) already exists. We can't use XfrmStateUpdate because it fails with ESRCH is no state with the specified key exist. Note we don't have the same issue for XFRM policies because XfrmPolicyUpdate doesn't return ESRCH if no such policy already exists. No idea why the two APIs are not consistent. We therefore need to implement a proper 'update or insert' logic for XFRM states ourselves. To that end, we first check if the state we want to add already exists. If it doesn't, we attempt to add it. If it fails with EEXIST, we know that some other state is conflicting. In that case, we attempt to remove any conflicting XFRM states that are found and then attempt to add the new state again. To find conflicting XFRM states, we use the same logic as the kernel does (cf. __xfrm_state_lookup). Signed-off-by: Paul Chaignon <paul@cilium.io>
This commit adds a catch-all XFRM policy for outgoing traffic that has the encryption bit. The goal here is to catch any traffic that may passthrough our encryption while we are replacing XFRM policies & states. Those operations cannot always be performed atomically so we may have brief moments where there is no XFRM policy to encrypt a subset of traffic. This policy ensures we drop such traffic and don't let it flow in plain text. We do need to match on the mark because there is also traffic flowing through XFRM that we don't want to encrypt (e.g., hostns traffic). Signed-off-by: Paul Chaignon <paul@cilium.io>
We recently changed our XFRM states and policies (IPs and marks). We however failed to remove the stale XFRM states and policies and it turns out that they conflict (e.g., the kernel ends up picking the stale policies for encryption instead of the new one). This commit therefore cleans up those stale XFRM states and policies. We can identify them based on mark values and masks (we switched from 0xFF00 to 0XFFFFFF00). The new XFRM states and policies are added as we receive the information on remote nodes. By removing the stale states and policies before the new ones are installed for all nodes, we could cause plain-text traffic on egress and packet drops on ingress. To ensure we never let plain-text traffic out, we will clean up the stale config only once the catch-all default-drop policy is installed. In that way, if there is a brief moment where, for a connection nodeA -> nodeB, we don't have a policy, traffic will be dropped instead of sent in plain-text. For each connection nodeA -> nodeB, those packet drops on egress and ingress of nodeA will happen between the time we replace the BPF datapath and the time we've installed the new XFRM state and policy corresponding to nodeB. Waiting longer to remove the stale states and policies doesn't impact the drops as they will keep happening until the new states and policies are installed. This is all happening on agent startup, as soon as we have the necessary information from k8s. Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno
force-pushed
the
fix-xfrm-config-replacement
branch
from
April 10, 2023 12:19
b8f97c1
to
5562165
Compare
/test |
michi-covalent
approved these changes
Apr 10, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for cli team files
should this be a release blocker for april patch release (1.13.2, 1.12.9, 1.11.16)? |
oh i guess it already is, i see the corresponding issue has release-blocker labels: #24780 |
jschwinger233
approved these changes
Apr 11, 2023
pchaigno
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Apr 11, 2023
pchaigno
added
backport-pending/1.13
The backport for Cilium 1.13.x for this PR is in progress.
and removed
needs-backport/1.13
This PR / issue needs backporting to the v1.13 branch
labels
Apr 11, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from master
to Backport pending to v1.13
in 1.13.2
Apr 11, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from master
to Backport pending to v1.12
in 1.12.9
Apr 11, 2023
gandro
added
backport-done/1.13
The backport for Cilium 1.13.x for this PR is done.
and removed
backport-pending/1.13
The backport for Cilium 1.13.x for this PR is in progress.
labels
Apr 12, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.13
to Backport done to v1.13
in 1.13.2
Apr 12, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from master
to Backport pending to v1.11
in 1.11.16
Apr 12, 2023
gandro
added
backport-done/1.12
The backport for Cilium 1.12.x for this PR is done.
and removed
backport-pending/1.12
labels
Apr 12, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.12
to Backport done to v1.12
in 1.12.9
Apr 12, 2023
michi-covalent
added
backport-done/1.11
The backport for Cilium 1.11.x for this PR is done.
and removed
backport-pending/1.11
labels
Apr 14, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.11
to Backport done to v1.11
in 1.11.16
Apr 14, 2023
This was referenced May 26, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/encryption
Impacts encryption support such as IPSec, WireGuard, or kTLS.
backport-done/1.11
The backport for Cilium 1.11.x for this PR is done.
backport-done/1.12
The backport for Cilium 1.12.x for this PR is done.
backport-done/1.13
The backport for Cilium 1.13.x for this PR is done.
kind/bug
This is a bug in the Cilium logic.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First commit is a bit of refactoring. Second implements a proper function to update XFRM states. Third adds a catch-all default-drop policy to avoid leaking plain-text traffic during the cleanup. Fourth commit implements the cleanup itself.
See commit descriptions for details.
Fixes: #24030.
Fixes: #24780.
Updates: #24010.