Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support L2-less devices with fast redirect #23935

Merged
merged 4 commits into from
Mar 17, 2023

Conversation

jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Feb 22, 2023

This PR adds L2 header for skb on L2-less devices before redirecting to internal veth, and enables bpf-based host routing work with wireguard as well as L2-less interfaces.

Fixes: #15075

Support L2-less devices with fast forward (bpf-based host routing) 

Signed-off-by: Zhichuan Liang gray.liang@isovalent.com

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 22, 2023
@julianwiedmann

This comment was marked as outdated.

bpf/lib/l3.h Outdated Show resolved Hide resolved
@jschwinger233 jschwinger233 force-pushed the gray/l2-less-fast-redirect branch 3 times, most recently from 8e388eb to 708f18c Compare February 23, 2023 08:14
@julianwiedmann julianwiedmann added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Feb 23, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 23, 2023
@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 23, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 24, 2023
@jschwinger233 jschwinger233 force-pushed the gray/l2-less-fast-redirect branch 2 times, most recently from c449da4 to 5291f99 Compare February 27, 2023 07:23
@julianwiedmann

This comment was marked as outdated.

@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 28, 2023
@maintainer-s-little-helper

This comment was marked as resolved.

@julianwiedmann

This comment was marked as outdated.

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 1, 2023
@jschwinger233 jschwinger233 force-pushed the gray/l2-less-fast-redirect branch 4 times, most recently from 3d37e2e to 820e50d Compare March 1, 2023 04:37
@jschwinger233
Copy link
Member Author

Restarting because it failed with what looks like two different new flakes. Rerunning to confirm. If it passes, we need to open issues for those flakes.

/test-1.24-5.4

Now it results with even more failures....

Originally the host routing mode would be set lagacy if wireguard is
enabled or any l2-less device exists. This commit lifts the limitation.

Fixes: cilium#15075

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
@jschwinger233
Copy link
Member Author

jschwinger233 commented Mar 14, 2023

Restarting because it failed with what looks like two different new flakes. Rerunning to confirm. If it passes, we need to open issues for those flakes.

/test-1.24-5.4

For this failure, I can make it pass locally with 2 changes:

  1. cherrypick Fix operator deadlock introduced in the #23605. #24343 to avoid operator deadlock
  2. add time.Sleep(10*time.Second) in the AfterEach to await policy to be truly deleted from bpf map.

So I presume the e2e test failures I hit before were both flakes, although the policy one is maybe not recorded before.

I can fix these "delete CRDs without waiting" race condition in e2e test in a separate PR later.

cc @julianwiedmann

@julianwiedmann
Copy link
Member

/test

@julianwiedmann
Copy link
Member

2. add `time.Sleep(10*time.Second)` in the [AfterEach](https://github.com/cilium/cilium/blob/v1.14.0-snapshot.0/test/k8s/datapath_configuration.go#L513) to await policy to be truly deleted from bpf map.

🎉 would be great to have a follow-on issue for this aspect!

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny fix-up needed. CI is all green already ✔️

bpf/bpf_host.c Outdated Show resolved Hide resolved
This commit adds l2 header before handle_ipv4 calls ipv4_local_delivery,
in order to enable fast-redirect from l2-less dev (e.g. cilium_wg0) to
internal veth.

Fixes: cilium#15075

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
This commits add unittest case for L3 skb fast redirecting to L2 device.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
The limitation for wireguard was originally recorded as not compatiable
with host-routing, and it's no more the case once pr cilium#23935 is merged.

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you @jschwinger233 ! 🚀

@pchaigno
Copy link
Member

pchaigno commented Mar 15, 2023

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: migrate-svc restart count values do not match

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.19 so I can create one.

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Comment on lines +25 to +26
* Therefore, we created a stud, mock_hanle_policy, to simply check if the
* our skb reaches there.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not worth a respin, but if you need to update your PR for another reason:

Suggested change
* Therefore, we created a stud, mock_hanle_policy, to simply check if the
* our skb reaches there.
* Therefore, we created a stud, mock_hanle_policy, to simply check if
* our skb reaches there.

There's another typo in the description of your first commit: lagacy.

@julianwiedmann
Copy link
Member

julianwiedmann commented Mar 17, 2023

The failing update test is known flake #24379:

/home/jenkins/workspace/Cilium-PR-K8s-1.16-kernel-4.19/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:515
migrate-svc restart count values do not match
Expected
: 0
to be identical to
: 13
/home/jenkins/workspace/Cilium-PR-K8s-1.16-kernel-4.19/src/github.com/cilium/cilium/test/k8s/updates.go:346

@julianwiedmann julianwiedmann added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 17, 2023
@borkmann borkmann merged commit f7bd675 into cilium:master Mar 17, 2023
@jibi
Copy link
Member

jibi commented Apr 3, 2023

Marking this for v1.13 backporting as another PR relies on some these changes.
Will only backport 46ea80c

@jibi jibi added the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Apr 3, 2023
@jibi jibi mentioned this pull request Apr 3, 2023
11 tasks
@jibi jibi added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datapath: Support L2-less with fast redirect
9 participants