Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.14] ipam: Fix invalid PodCIDR in CiliumNode in ENI/Azure/MultiPool mode #27924

Merged
merged 2 commits into from
Sep 7, 2023

Conversation

gandro
Copy link
Member

@gandro gandro commented Sep 4, 2023

This is a custom backport of the following PR as the IPSec part did not apply cleanly due to minor changes in the error handling of enableIPsec{IPv4,IPv6}:

@pchaigno For review. I'm also fine to hold this off until we get the IPSec CI on v1.14, please let me know if you'd prefer to have IPSec CI backported to v1.14 first.

Once this PR is merged, you can update the PR labels via:

for pr in 26663; do contrib/backporting/set-labels.py $pr done 1.14; done

or with

make add-labels BRANCH=v1.14 ISSUES=26663

[ upstream commit a4c43f3 ]

IPSec relies on the "pod subnet" CIDRs for encryption in IPAM modes
(e.g. ENI, Azure) which do not have a pod CIDR (aka. alloc CIDR).

Therefore, we should not use the alloc CIDR in such modes. This commit
moves the setup of routes based on the alloc CIDR into the conditions
which are executed if we do are not in pod subnet based encryption.

In addition, we add some temporary code to remove the old bogus route
based for the local node in subnet encryption mode.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 1568558 ]

This commit fixes an issue where Cilium-Agent would announce its
internal auto-generated IPv4/IPv6AllocRange by writing it into the local
CiliumNode resource.

Cilium-Agents IPv4/IPv6AllocRange is Cilium's internal representation of
the local node's PodCIDR. The concept of a single allocation CIDR
however is outdated, since in many IPAM modes (ENI, Azure, AlibabaCloud,
MultiPool) there either is no PodCIDR to begin with (and pod IPs are
allocated using a different scheme), or there are multiple PodCIDRs (in
the case of MultiPool).

Before this commit, cilium-agent used to copy its internal PodCIDR into
the CiliumNode resource, but this behavior is only correct in Kubernetes
IPAM mode. In Kubernetes IPAM mode, the IPv4/IPv6AllocRange is taken
from the Kubernetes Node resource, and thus it makes sense to announce
it to other cilium-agent instances via the CiliumNode resource. This
field then is for example used by `auto-direct-node-routes` to install
routes for direct routing mode.

However, in all other IPAM modes that we have, copying the internal
PodCIDR into the CiliumNode resource does not make sense:

 - In ClusterPool mode, the CiliumNode PodCIDR field is populated by
   cilium-operator. Therefore, we should not try to overwrite it
   ourselves.  This behavior is unchanged by this commit.
 - In all other current IPAM modes, the CiliumNode PodCIDR field is not
   used. Instead we use fields specific to those IPAM modes (c.f.
   CiliumNode.Spec.IPAM). In those modes, the previous code therefore wrote
   a auto-generatead, but otherwise valid IPv4/IPv6AllocRange into the
   field. That auto-generated PodCIDR is not used for pod IP allocation in
   those modes, so it does not make sense to announce it either.

The announced auto-generated PodCIDR is causing issues in IPAM MultiPool
mode. With this commit, we will no longer install invalid routes for the
auto-generated PodCIDR. This was mostly harmless (since the
auto-generated PodCIDR rarely overlapped with the real ones), but still
a potential cause for confusion and bugs.

In addition, ENI-mode users have reported that they are confused by the
this "fake" PodCIDR, which is understandable, since that PodCIDR is
meaningless in ENI mode (cilium#9409).

Long term, we want to remove the concept of the IPv4/IPv6AllocCIDR
completely (and remove the auto-generation code), but this is a first
incremental step towards that goal.

Fixes: cilium#9409

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro requested a review from a team as a code owner September 4, 2023 13:47
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Sep 4, 2023
@gandro gandro requested a review from pchaigno September 4, 2023 13:47
@margamanterola
Copy link
Member

/ci-ipsec-upgrade

@margamanterola
Copy link
Member

margamanterola commented Sep 7, 2023

/ci-ipsec-e2e -> This didn't work because it doesn't exist on the 1.14 branch yet :(

@gandro
Copy link
Member Author

gandro commented Sep 7, 2023

/test-backport-1.14

Edit:

@gandro
Copy link
Member Author

gandro commented Sep 7, 2023

/ci-aks

@gandro gandro removed the request for review from pchaigno September 7, 2023 13:18
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I don't think we need to hold off on IPsec CI. It will be there soon enough and there aren't much differences between v1.14 and main.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 7, 2023
@gandro

This comment was marked as resolved.

@gandro gandro merged commit 8c98a7d into cilium:v1.14 Sep 7, 2023
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants