Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath: Fix double SNAT #25189

Merged
merged 1 commit into from
May 2, 2023
Merged

datapath: Fix double SNAT #25189

merged 1 commit into from
May 2, 2023

Conversation

brb
Copy link
Member

@brb brb commented Apr 28, 2023

We have observed, that the same packet can be handled multiple times by
the bpf_host's to-netdev. This can happen when the to-netdev is attached
to a bridge and to an outgoing netdev which is attached to the bridge.
This can result e.g., into multiple unnecessary SNATs for the same
packet which can break the host-firewall (the host-firewall for a reply
to such a packet is invoked after only the first rev-SNAT).

To fix that particular case set the SNAT done flag (an SKB mark).

Tested manually. On kind-worker node (172.18.0.3):

ip link add br0 type bridge
ip addr add 172.18.0.3/16 dev br0
ip addr del 172.18.0.3/16 dev eth0
ip link set dev br0 up
ip link set dev eth0 master br0
ip route replace 0.0.0.0/0 via 172.18.0.1

Make sure that Cilium attached to both:

cilium status | grep KubeProxy
KubeProxyReplacement:    Strict [br0 172.18.0.3, eth0 172.18.0.3]

Then start any pod on kind-worker, and run tcpdump on kind-work. curl
1.1.1.1 from that pod.

tcpdump -i any -n 'dst port 80'
lxc2663158aafe1 In  IP 10.244.1.161.43710 > 1.1.1.1.80: Flags [S]
br0   Out IP 172.18.0.3.43710 > 1.1.1.1.80: Flags [S]
eth0  Out IP 172.18.0.3.43710 > 1.1.1.1.80: Flags [S]

Redo the same test w/o the fix. The relevant tcpdump output:

tcpdump -i any -n 'dst host 1.1.1.1'
lxcbce3c2f349e1 In  IP 10.244.1.120.57408 > 1.1.1.1.80: Flags [S]
br0   Out IP 172.18.0.3.57408 > 1.1.1.1.80: Flags [S]
eth0  Out IP 172.18.0.3.56763 > 1.1.1.1.80: Flags [S] <-- 2nd SNAT!

Suggested-by: Julian Wiedmann jwi@isovalent.com

Fix #24916

@brb brb added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.10 Apr 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.3 Apr 28, 2023
@brb
Copy link
Member Author

brb commented Apr 28, 2023

/ci-datapath

@brb brb force-pushed the pr/brb/dp-fix-double-snat branch from f309a15 to 8844cd8 Compare April 28, 2023 14:11
@brb brb requested a review from julianwiedmann April 28, 2023 14:12
@brb brb marked this pull request as ready for review April 28, 2023 14:12
@brb brb requested a review from a team as a code owner April 28, 2023 14:12
@brb brb changed the title WIP: datapath: Fix double SNAT datapath: Fix double SNAT Apr 28, 2023
@brb brb force-pushed the pr/brb/dp-fix-double-snat branch from 8844cd8 to 04bb371 Compare April 28, 2023 14:14
@julianwiedmann
Copy link
Member

Please keep the IPv6 path in sync :)

@brb
Copy link
Member Author

brb commented Apr 28, 2023

Please keep the IPv6 path in sync :)

Unfortunately, atm no BPF-based SNAT for IPv6 😭

@julianwiedmann
Copy link
Member

Please keep the IPv6 path in sync :)

Unfortunately, atm no BPF-based SNAT for IPv6 sob

uh, what about host-originating connections vs Nodeport NAT ports? I believe we should add the same thing in nodeport_snat_fwd_ipv6(), if only to avoid code drift. Or a comment why it's not needed there (yet).

We have observed, that the same packet can be handled multiple times by
the bpf_host's to-netdev. This can happen when the to-netdev is attached
to a bridge and to an outgoing netdev which is attached to the bridge.
This can result e.g., into multiple unnecessary SNATs for the same
packet which can break the host-firewall (the host-firewall for a reply
to such a packet is invoked after only the first rev-SNAT).

To fix that particular case set the SNAT done flag (an SKB mark).

Tested manually. On kind-worker node (172.18.0.3):

    ip link add br0 type bridge
    ip addr add 172.18.0.3/16 dev br0
    ip addr del 172.18.0.3/16 dev eth0
    ip link set dev br0 up
    ip link set dev eth0 master br0
    ip route replace 0.0.0.0/0 via 172.18.0.1

Make sure that Cilium attached to both:

    cilium status | grep KubeProxy
    KubeProxyReplacement:    Strict [br0 172.18.0.3, eth0 172.18.0.3]

Then start any pod on kind-worker, and run tcpdump on kind-work. curl
1.1.1.1 from that pod.

    tcpdump -i any -n 'dst port 80'
    lxc2663158aafe1 In  IP 10.244.1.161.43710 > 1.1.1.1.80: Flags [S]
    br0   Out IP 172.18.0.3.43710 > 1.1.1.1.80: Flags [S]
    eth0  Out IP 172.18.0.3.43710 > 1.1.1.1.80: Flags [S]

Redo the same test w/o the fix. The relevant tcpdump output:

    tcpdump -i any -n 'dst host 1.1.1.1'
    lxcbce3c2f349e1 In  IP 10.244.1.120.57408 > 1.1.1.1.80: Flags [S]
    br0   Out IP 172.18.0.3.57408 > 1.1.1.1.80: Flags [S]
    eth0  Out IP 172.18.0.3.56763 > 1.1.1.1.80: Flags [S] <-- 2nd SNAT!

Suggested-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/dp-fix-double-snat branch from 04bb371 to 0b57e29 Compare May 1, 2023 08:47
@brb brb requested review from gentoo-root and removed request for julianwiedmann May 1, 2023 08:48
@brb
Copy link
Member Author

brb commented May 1, 2023

/test

@brb
Copy link
Member Author

brb commented May 2, 2023

The runtime CI hit #25178.

@brb brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 2, 2023
@brb brb merged commit eff26cf into main May 2, 2023
55 of 56 checks passed
@brb brb deleted the pr/brb/dp-fix-double-snat branch May 2, 2023 05:22
@joamaki joamaki mentioned this pull request May 2, 2023
8 tasks
@joamaki joamaki added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels May 2, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 May 2, 2023
@joamaki joamaki mentioned this pull request May 3, 2023
8 tasks
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.10 May 3, 2023
@julianwiedmann julianwiedmann added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Jul 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 Jul 10, 2023
@julianwiedmann julianwiedmann added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jul 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.12.10
Backport done to v1.12
1.13.3
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

datapath: Ensure that "to-netdev" is invoked only once for the same packet
4 participants