Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.14] cilium: Fix 16bit ifindex limitation #27880

Merged
merged 3 commits into from
Sep 1, 2023
Merged

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Sep 1, 2023

@borkmann borkmann added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Sep 1, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the kind/backports This PR provides functionality previously merged into master. label Sep 1, 2023
[ upstream commit bd8b4d0 ]
[ manual conflict resolution in kube_proxy_replacement.go
  and nodeport.h due to difference from upstream, the latter
  mainly in locations where we ifdef ct_state.ifindex ]

The limitation exists mainly on old kernels where the fib lookup helper
does not populate the outgoing ifindex. Only for this case we rely on
the CT lookup stored ifindex which back then was added as a 16bit field
due to limited padding space available. Nowadays this can be lifted
after the big rework in #23884. We've seen users with high netdevice
churn run into this limitation where the agent bails out.

Apart from fixing the bleed, this can be further refined by not relying
on the asm.FnRedirectPeer helper presence but by actually doing a runtime
BPF program probe so that stable kernels can even be covered.

Fixes: #16260
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 323b4cb ]
[ no conflicts ]

Commit d1c362e1dd68 ("bpf: Always return target ifindex in bpf_fib_lookup")
which HAVE_FIB_IFINDEX reflects is part of is 5.10+ kernels. Add the define
to the complexity tests for 5.10 and net-next to better reflect real world.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann marked this pull request as ready for review September 1, 2023 09:46
@borkmann borkmann requested a review from a team as a code owner September 1, 2023 09:46
@borkmann
Copy link
Member Author

borkmann commented Sep 1, 2023

/test-backport-1.14

@julianwiedmann
Copy link
Member

Maybe just grab #27528 as well, to align things with upstream?

@borkmann
Copy link
Member Author

borkmann commented Sep 1, 2023

Maybe just grab #27528 as well, to align things with upstream?

Good point, will include it! Thanks!

[ upstream commit 8331fab ]

As part of handling inbound DSR-ed requests at the backend node, we create
a CT entry. The relevant code originated from nodeport_lb*(), where we also
set the CT entry's ifindex. There it is used by the LB to route replies by
local / NAT backends back to the client via the ingress interface.

But for DSR it makes little sense to track the ingress interface at the
backend. It's how the backend would reach the LB, not the client. And it's
perfectly fine to route replies to the client through a different
interface.

Remove this tracking for DSR connections. Note that it's currently unused,
it was merely added to have fully populated CT entries in case the ifindex
would have ever been needed.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member Author

borkmann commented Sep 1, 2023

/test-backport-1.14

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@julianwiedmann julianwiedmann added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 1, 2023
@borkmann borkmann merged commit e2eade0 into v1.14 Sep 1, 2023
196 checks passed
@borkmann borkmann deleted the pr/v1.14-ifindex branch September 1, 2023 12:10
@maintainer-s-little-helper maintainer-s-little-helper bot removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants