-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipsec: Fix XfrmInNoStates
drops on ESK & AKS upgrades
#25724
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pchaigno
added
kind/bug
This is a bug in the Cilium logic.
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
upgrade-impact
This PR has potential upgrade or downgrade impact.
area/encryption
Impacts encryption support such as IPSec, WireGuard, or kTLS.
integration/cloud
Related to integration with cloud environments such as AKS, EKS, GKE, etc.
needs-backport/1.11
needs-backport/1.13
This PR / issue needs backporting to the v1.13 branch
labels
May 26, 2023
pchaigno
force-pushed
the
ipsec-eks-upgrade
branch
2 times, most recently
from
May 26, 2023 18:17
25d0a36
to
f182153
Compare
The XFRM IN policies and states didn't change so we should never need to remove any stale XFRM IN configs. Let's thus simplify the logic to find stale policies and states accordingly. I would expect this incorrect removal to cause a few drops on agent restart, but after multiple attempts to reproduce on small (3 nodes) and larger (20) clusters (EKS & GKE) with a drop-sensitive application (migrate-svc), I'm not able to see such drops. I'm guessing this is because we reinstall the XFRM IN configs right after we removed them so there isn't really much time for a packet to be received and dropped. Fixes: 688dc9a ("ipsec: Remove stale XFRM states and policies") Signed-off-by: Paul Chaignon <paul@cilium.io>
This small bit of refactoring will make later changes a bit easier. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Same as the previous commit but on a different set of conditions. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This is a partial revert of commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS"). One change that commit 3e59b68 ("ipsec: Per-node XFRM states & policies for EKS & AKS") make on EKS and AKS was to switch from using NodeInternalIPs to using CiliumInternalIPs for outer IPsec (ESN) IP addresses. That made the logic more consistent with the logic we use for other IPAM schemes (e.g., GKE). It however causes serious connectivity issues on upgrades and downgrades. This is mostly because typically not all nodes are updated to the new Cilium version at the same time. If we consider two pods on nodes A and B trying to communicate, then node A may be using the old NodeInternalIPs while node B is already on the new CiliumInternalIPs. When node B sends traffic to node A, node A doesn't have the XFRM state IN necessary to decrypt it. The same happens in the other direction. This commit reintroduces the NodeInternalIPs for EKS and AKS. Subsequent commits will introduce additional changes to enable a proper migration path from NodeInternalIPs to CiliumInternalIPs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, we currently use NodeInternalIPs for the IPsec tunnels. A subsequent commit will allow us to change that to switch to using CiliumInternalIPs (as done on GKE). For that to be possible without breaking inter-node connectivity for the whole duration of the switch, we need an intermediate mode where both CiliumInternalIPs and NodeInternalIPs are accepted on ingress. The idea is that we will then have a two-steps migration from NodeInternalIP to CiliumInternalIP: 1. All nodes are using NodeInternalIP. 2. Upgrade to the version of Cilium that supports both NodeInternalIP and CiliumInternalIP and encapsulates IPsec traffic with NodeInternalIP. 3. Via an agent flag, tell Cilium to switch to encapsulating IPsec traffic with CiliumInternalIP. 4. All nodes are using CiliumInternalIP. This commit implements the logic for step 2 above. To that end, we will duplicate the XFRM IN states such that we have both: src 0.0.0.0 dst [NodeInternalIP] # existing src 0.0.0.0 dst [CiliumInternalIP] # new thus matching and being able to receive IPsec packets with an outer destination IP of either NodeInternalIP or CiliumInternalIP. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, IPsec used NodeInternalIPs for the encapsulation. This commit introduces a new flag to allow switching from NodeInternalIPs to CiliumInternalIPs; it defaults to the former. This new flag allows for step 3 of the migration plan defined in the previous commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
pchaigno
force-pushed
the
ipsec-eks-upgrade
branch
from
May 27, 2023 19:16
f182153
to
9b91f40
Compare
/test |
This was referenced Jun 5, 2023
pchaigno
added
backport-pending/1.12
backport-pending/1.13
The backport for Cilium 1.13.x for this PR is in progress.
and removed
needs-backport/1.11
needs-backport/1.13
This PR / issue needs backporting to the v1.13 branch
labels
Jun 5, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.13
in 1.13.3
Jun 5, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.12
in 1.12.10
Jun 5, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.13
in 1.13.3
Jun 5, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.11
in 1.11.18
Jun 5, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.11
in 1.11.18
Jun 5, 2023
8 tasks
pchaigno
added
backport-done/1.12
The backport for Cilium 1.12.x for this PR is done.
backport-done/1.13
The backport for Cilium 1.13.x for this PR is done.
and removed
backport-pending/1.12
backport-pending/1.13
The backport for Cilium 1.13.x for this PR is in progress.
labels
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.13
to Backport done to v1.13
in 1.13.3
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.13
to Backport done to v1.13
in 1.13.3
Jun 9, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.12
to Backport done to v1.12
in 1.12.10
Jun 9, 2023
This was referenced Jun 9, 2023
qmonnet
added
backport-done/1.11
The backport for Cilium 1.11.x for this PR is done.
and removed
backport-pending/1.11
labels
Jun 14, 2023
This was referenced Jun 14, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/encryption
Impacts encryption support such as IPSec, WireGuard, or kTLS.
backport/author
The backport will be carried out by the author of the PR.
backport-done/1.11
The backport for Cilium 1.11.x for this PR is done.
backport-done/1.12
The backport for Cilium 1.12.x for this PR is done.
backport-done/1.13
The backport for Cilium 1.13.x for this PR is done.
integration/cloud
Related to integration with cloud environments such as AKS, EKS, GKE, etc.
kind/bug
This is a bug in the Cilium logic.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-blocker/1.11
This issue will prevent the release of the next version of Cilium.
release-blocker/1.12
This issue will prevent the release of the next version of Cilium.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
upgrade-impact
This PR has potential upgrade or downgrade impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request fixes IPsec packet drops of type
XfrmInNoStates
that would happen on EKS and AKS for the whole duration of the upgrade (until the last node is upgraded). The root of the issue is that we switched from using CiliumInternalIP to using NodeInternalIP for the IPsec encapsulation and that change isn't properly handled on the upgrade.To avoid the
XfrmInNoStates
drops, this pull request introduces a 2-steps upgrade process. First, after the usual upgrade, both CiliumInternalIPs and NodeInternalIPs encapsulations will be accepted on ingress, but NodeInternalIPs will still be used for the encapsulation on egress. Then, in a second time, users can switch touse-cilium-internal-ip-for-ipsec=true
(defaults false) to start using CiliumInternalIPs.XfrmOutPolBlock
also happening during the IPsec + EKS/AKS upgrade. Those will be addressed in a separate PR.How was this tested?
I performed two types of tests: first with a normal upgrade process and second with a split cluster:
migrate-svc
application running during the upgrade. The two-steps upgrade was carried out and, after each step, I checked that noXfrmInNoStates
drops were reported. Themigrate-svc
application did have drops during the first step because of the aforementionedXfrmOutPolBlock
drops. It did not have any additional drops afterward. Our connectivity tests suite also passed after each step.XfrmInNoStates
norXfrmOutPolBlock
, were reported.Fixes: #24773.
Fixes: #24030.