Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: nodeport: add RevDNAT-based FIB lookup for reply traffic #26638

Merged

Conversation

julianwiedmann
Copy link
Member

@julianwiedmann julianwiedmann commented Jul 5, 2023

When moving the RevDNAT for backend replies from bpf_lxc into bpf_host, we lost the routing based on the post-RevDNAT layout of the packet. Depending on the installed routing rules, we then end up sending the (RevDNATed) packet from the wrong egress interface.

So as part of the RevDNAT processing in to-netdev, do a FIB lookup that considers the desired source IP. And redirect the packet if it's currently on the wrong interface. We intentionally delay the actual RevDNAT rewrite until the packet has reached the correct interface, so that the packet looks as expected to components in to-netdev (in particular HostFW).

Long-term we would want

  • a FIB_DONE mark on the packet (similar to SNAT_DONE), to skip additional lookups.
  • improve the FIB lookup in from-container (for BPF Host Routing) to already consider service RevDNAT.

The initial idea was to have this check at the very top of the to-netdev program (see #26635). But we're hitting program size limits in 4.19 there, and improving that is too much of a refactor for a fix PR.

Fixes: #26166

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 5, 2023
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jul 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 5, 2023
@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 8e4d6b6 to 67f1862 Compare July 5, 2023 10:24
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 67f1862 to 67659fc Compare July 5, 2023 11:38
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 67659fc to 5a74b80 Compare July 5, 2023 13:33
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 5a74b80 to 44d365e Compare July 5, 2023 13:40
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann changed the title 1.14 nodeport revdnat fib v3 bpf: nodeport: add RevDNAT-based FIB lookup for reply traffic Jul 5, 2023
@julianwiedmann julianwiedmann added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loadbalancing Impacts load-balancing and Kubernetes service implementations needs-backport/1.13 labels Jul 5, 2023
@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 44d365e to 383e5ab Compare July 17, 2023 14:09
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 383e5ab to 7c21344 Compare July 25, 2023 08:20
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 7c21344 to 89f6c71 Compare July 26, 2023 14:10
@julianwiedmann
Copy link
Member Author

/test

@julianwiedmann julianwiedmann marked this pull request as ready for review July 26, 2023 14:17
@julianwiedmann julianwiedmann requested a review from a team as a code owner July 26, 2023 14:17
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 31, 2023
@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 89f6c71 to 2996fb3 Compare August 1, 2023 05:47
@julianwiedmann
Copy link
Member Author

/test

Partially verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
We cannot verify signatures from co-authors, and some of the co-authors attributed to this commit require their commits to be signed.
When replies by local / DSR backends enter to-netdev, they have been routed
to the interface using the backend IP as source IP. But depending on what
routing rules are installed on the system, the routing for the RevDNATed
packet is different.

Improve the RevDNAT code in to-netdev to first perform a FIB lookup (based
on the src IP from the RevDNAT info), and if needed redirect the packet to
the correct egress interface.

This requires that bpf_host is also attached to the chosen egress iface,
so that the actual RevDNAT rewrite gets performed there.

Also update some of the tests to cover this scenario.

[Note: ideally we would have this check much earlier in to-netdev, before
       even going through the HostFW code. But that requires shrinking the
       program size first, so postponing that to a follow-on PR]

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann julianwiedmann force-pushed the 1.14-nodeport-revdnat-fib-v3 branch from 2996fb3 to a708962 Compare August 1, 2023 06:01
@julianwiedmann
Copy link
Member Author

/test

@brb
Copy link
Member

brb commented Aug 1, 2023

Before merging this PR, it would be good to get Tests with secondary NodePort device running. Currently, it's disabled in ci-ginkgo (as it doesn't create a secondary network, while the Vagrant setup did), and CLI's connectivity tests used in ci-e2e don't test NodePort service access via multiple networks.

Anyway, the suggested testing plan for ci-e2e:

  1. Extend some matrix configurations (with KPR=true) in the conformance-e2e.yaml with secondary-network: true.
  2. When provisioning a Kind cluster, pass --secondary-network (introduced in Add secondary iface to KIND network #26338) depending on the option above.
  3. Extend the feature detection to detect that Kind has multiple networks.
  4. In the NodePort from outside test case consider all node IP addresses.

@julianwiedmann
Copy link
Member Author

Rebased to resolve trivial bpf/tests conflicts with #27134.

@dylandreimerink dylandreimerink merged commit 3cfe559 into cilium:main Aug 1, 2023
send_trace_notify(ctx, obs_point, 0, 0, 0, 0, trace.reason,
trace.monitor);
if (ret == CTX_ACT_OK)
send_trace_notify(ctx, obs_point, 0, 0, 0, 0, trace.reason,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education... do we only trace the forwarded packets at the destination interface?

Copy link
Member Author

@julianwiedmann julianwiedmann Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we have a strict precedent for it at the moment.

I could be convinced that throwing notifications for the intermediate interfaces has value. But then we should also apply proper CT-driven aggregration, and include the corresponding ifindex?

@julianwiedmann julianwiedmann deleted the 1.14-nodeport-revdnat-fib-v3 branch August 1, 2023 13:59
@sayboras sayboras mentioned this pull request Aug 3, 2023
11 tasks
@julianwiedmann julianwiedmann added the backport/author The backport will be carried out by the author of the PR. label Aug 3, 2023
@julianwiedmann julianwiedmann added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 labels Aug 9, 2023
@michi-covalent michi-covalent added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Sep 9, 2023
@julianwiedmann
Copy link
Member Author

No intentions to backport this into v1.13 anymore. Doing so would require #26136 and #26290.

@julianwiedmann julianwiedmann added affects/v1.13 This issue affects v1.13 branch and removed needs-backport/1.13 labels Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch area/loadbalancing Impacts load-balancing and Kubernetes service implementations backport/author The backport will be carried out by the author of the PR. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

DSR broken with VLANs on 1.13.3
5 participants