Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipsec: Fix XfrmInNoStates drops on ESK & AKS upgrades #25724

Merged
merged 7 commits into from
May 30, 2023

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented May 26, 2023

This pull request fixes IPsec packet drops of type XfrmInNoStates that would happen on EKS and AKS for the whole duration of the upgrade (until the last node is upgraded). The root of the issue is that we switched from using CiliumInternalIP to using NodeInternalIP for the IPsec encapsulation and that change isn't properly handled on the upgrade.

To avoid the XfrmInNoStates drops, this pull request introduces a 2-steps upgrade process. First, after the usual upgrade, both CiliumInternalIPs and NodeInternalIPs encapsulations will be accepted on ingress, but NodeInternalIPs will still be used for the encapsulation on egress. Then, in a second time, users can switch to use-cilium-internal-ip-for-ipsec=true (defaults false) to start using CiliumInternalIPs.

⚠️ Note this pull request doesn't fix other drops of type XfrmOutPolBlock also happening during the IPsec + EKS/AKS upgrade. Those will be addressed in a separate PR.

How was this tested?

I performed two types of tests: first with a normal upgrade process and second with a split cluster:

  1. The usual upgrade path was tested on 2 clusters of 20 nodes each, with EKS and GKE. The upgrade was performed from Cilium v1.13.0 to this patchset, with the drop-sensitive migrate-svc application running during the upgrade. The two-steps upgrade was carried out and, after each step, I checked that no XfrmInNoStates drops were reported. The migrate-svc application did have drops during the first step because of the aforementioned XfrmOutPolBlock drops. It did not have any additional drops afterward. Our connectivity tests suite also passed after each step.
  2. I deployed a 3-nodes EKS cluster with a mix of Cilium versions to simulate the impact of a prolonged upgrade. Two nodes ran Cilium v1.13.0, while the third ran this patchset backported on the v1.13 branch. I ran the connectivity tests and confirmed that no drops, neither XfrmInNoStates nor XfrmOutPolBlock, were reported.

Fixes: #24773.
Fixes: #24030.

Fix a bug that would cause connectivity drops of type XfrmInNoStates on upgrade when IPsec is enabled with ENI or Azure IPAM mode.

@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. upgrade-impact This PR has potential upgrade or downgrade impact. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. needs-backport/1.11 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels May 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.3 May 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.10 May 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.11.18 May 26, 2023
@pchaigno pchaigno force-pushed the ipsec-eks-upgrade branch 2 times, most recently from 25d0a36 to f182153 Compare May 26, 2023 18:17
pchaigno and others added 7 commits May 27, 2023 21:13
The XFRM IN policies and states didn't change so we should never need
to remove any stale XFRM IN configs. Let's thus simplify the logic to
find stale policies and states accordingly.

I would expect this incorrect removal to cause a few drops on agent
restart, but after multiple attempts to reproduce on small (3 nodes)
and larger (20) clusters (EKS & GKE) with a drop-sensitive application
(migrate-svc), I'm not able to see such drops. I'm guessing this is
because we reinstall the XFRM IN configs right after we removed them so
there isn't really much time for a packet to be received and dropped.

Fixes: 688dc9a ("ipsec: Remove stale XFRM states and policies")
Signed-off-by: Paul Chaignon <paul@cilium.io>
This small bit of refactoring will make later changes a bit easier.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on
GitHub to not show space-only changes.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
No functional changes. Best viewed with git show -b or the equivalent on
GitHub to not show space-only changes.

Same as the previous commit but on a different set of conditions.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This is a partial revert of commit 3e59b68 ("ipsec: Per-node XFRM
states & policies for EKS & AKS").

One change that commit 3e59b68 ("ipsec: Per-node XFRM states &
policies for EKS & AKS") make on EKS and AKS was to switch from using
NodeInternalIPs to using CiliumInternalIPs for outer IPsec (ESN) IP
addresses. That made the logic more consistent with the logic we use for
other IPAM schemes (e.g., GKE).

It however causes serious connectivity issues on upgrades and
downgrades. This is mostly because typically not all nodes are updated
to the new Cilium version at the same time. If we consider two pods on
nodes A and B trying to communicate, then node A may be using the old
NodeInternalIPs while node B is already on the new CiliumInternalIPs.
When node B sends traffic to node A, node A doesn't have the XFRM state
IN necessary to decrypt it. The same happens in the other direction.

This commit reintroduces the NodeInternalIPs for EKS and AKS. Subsequent
commits will introduce additional changes to enable a proper migration
path from NodeInternalIPs to CiliumInternalIPs.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, we currently use NodeInternalIPs for the IPsec tunnels.
A subsequent commit will allow us to change that to switch to using
CiliumInternalIPs (as done on GKE).

For that to be possible without breaking inter-node connectivity for the
whole duration of the switch, we need an intermediate mode where both
CiliumInternalIPs and NodeInternalIPs are accepted on ingress.

The idea is that we will then have a two-steps migration from
NodeInternalIP to CiliumInternalIP:

  1. All nodes are using NodeInternalIP.
  2. Upgrade to the version of Cilium that supports both NodeInternalIP
     and CiliumInternalIP and encapsulates IPsec traffic with
     NodeInternalIP.
  3. Via an agent flag, tell Cilium to switch to encapsulating IPsec
     traffic with CiliumInternalIP.
  4. All nodes are using CiliumInternalIP.

This commit implements the logic for step 2 above. To that end, we will
duplicate the XFRM IN states such that we have both:

    src 0.0.0.0 dst [NodeInternalIP]   # existing
    src 0.0.0.0 dst [CiliumInternalIP] # new

thus matching and being able to receive IPsec packets with an outer
destination IP of either NodeInternalIP or CiliumInternalIP.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On EKS and AKS, IPsec used NodeInternalIPs for the encapsulation. This
commit introduces a new flag to allow switching from NodeInternalIPs to
CiliumInternalIPs; it defaults to the former.

This new flag allows for step 3 of the migration plan defined in the
previous commit.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno pchaigno marked this pull request as ready for review May 27, 2023 19:21
@pchaigno pchaigno requested review from a team as code owners May 27, 2023 19:21
@pchaigno pchaigno requested a review from mhofstetter May 27, 2023 19:21
@pchaigno
Copy link
Member Author

/test

@pchaigno pchaigno added backport-pending/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.11 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.10 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.18 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.18 Jun 5, 2023
@pchaigno pchaigno added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 Jun 9, 2023
@qmonnet qmonnet moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.18 Jun 14, 2023
@qmonnet qmonnet added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport/author The backport will be carried out by the author of the PR. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.11 This issue will prevent the release of the next version of Cilium. release-blocker/1.12 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
No open projects
1.11.18
Backport done to v1.11
1.12.10
Backport done to v1.12
1.13.3
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

4 participants