Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: avoid overlapping pod and service CIDRs on AKS #2446

Merged
merged 1 commit into from
Mar 26, 2024

Conversation

tklauser
Copy link
Member

Follow cilium/cilium#31504, quoting its commit description:

The default service CIDR of AKS clusters is 10.0.0.0/16 1.
Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
use cilium's default of 10.0.0.0/8, which overlaps. This can
lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
the same as the hubble-relay pod IP, or similar shenanigans. This
usually breaks the cluster utterly.

The fix is relatively straight-forward: set a pod CIDR for cilium which
does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
is what is recommended in 2.

@tklauser tklauser added the area/CI Continuous Integration testing issue or flake label Mar 22, 2024
@tklauser tklauser requested review from bimmlerd and glrf March 22, 2024 14:58
@tklauser tklauser requested a review from a team as a code owner March 22, 2024 14:58
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for thinking of this :)

Follow cilium/cilium#31504, quoting its commit
description:

> The default service CIDR of AKS clusters is 10.0.0.0/16 [1].
> Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
> use cilium's default of 10.0.0.0/8, which overlaps. This can
> lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
> the same as the hubble-relay pod IP, or similar shenanigans. This
> usually breaks the cluster utterly.
>
> The fix is relatively straight-forward: set a pod CIDR for cilium which
> does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
> is what is recommended in [2].
>
> [1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities
> [2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Co-authored-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
@tklauser tklauser force-pushed the pr/tklauser/aks-ci-fix-overlapping-cidrs branch from 6493e31 to 34f8661 Compare March 26, 2024 14:20
@tklauser tklauser merged commit 918ae16 into main Mar 26, 2024
13 checks passed
@tklauser tklauser deleted the pr/tklauser/aks-ci-fix-overlapping-cidrs branch March 26, 2024 16:35
@@ -10,4 +10,5 @@ cilium install \
--wait=false \
--set loadBalancer.l7.backend=envoy \
--set tls.secretsBackend=k8s \
--set bpf.monitorAggregation=none
--set bpf.monitorAggregation=none \
--set ipam.operator.clusterPoolIPv4PodCIDRList=192.168.0.0/16" # To avoid clashing with the default Service CIDR of AKS (10.0.0.0/16)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke CI on AKS due to the trailing ". Unfortunately this wasn't caught in CI because we would have needed to follow the workflow testing instructions here:

### FOR TESTING PURPOSES
# This workflow runs in the context of `main`, and ignores changes to
# workflow files in PRs. For testing changes to this workflow from a PR:
# - Make sure the PR uses a branch from the base repository (requires write
# privileges). It will not work with a branch from a fork (missing secrets).
# - Uncomment the `pull_request` event below, commit separately with a `DO
# NOT MERGE` message, and push to the PR. As long as the commit is present,
# any push to the PR will trigger this workflow.
# - Don't forget to remove the `DO NOT MERGE` commit once satisfied. The run
# will disappear from the PR checks: please provide a direct link to the
# successful workflow run (can be found from Actions tab) in a comment.
#
# pull_request: {}

Fixed in #2452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants