-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: dsr: restore CB_SRC_LABEL across DSR-INGRESS tail-call #24794
bpf: dsr: restore CB_SRC_LABEL across DSR-INGRESS tail-call #24794
Conversation
/test |
Shuffle the two DSR-INGRESS tail-calls into the ENABLE_DSR-guarded section, so that we can add calls to DSR-only code to them in a subsequent patch. Their callers are already guarded in the same fashion. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
We need to free up the CB_SRC_LABEL slot, but at least for IPv6 all skb->cb slots are already in use. So stop using the CB for transporting the DSR info, but just extract it a second time inside the tail-call. There's opportunities to clean this up more, and also alternative approaches (eg. use a per-CPU map entry). But let's keep things intentationally simple for now. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
All nodeport code paths that tail-call back to IPV*_FROM_NETDEV need to restore the CB_SRC_LABEL with `src_identity`. We missed doing so for the DSR-Ingress tail-calls. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
1a4d401
to
c714d1d
Compare
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/1739/ If it is a flake and a GitHub issue doesn't already exist to track it, comment |
(CI was green, just needed a trivial rebase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a few [minor] questions
int ret, l4_off; | ||
__be16 port; | ||
__be16 port = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you init port
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to init the addr
to appease the verifier, and figured why not go the whole way. But yeah, looks like it would pass either way.
ret = nodeport_extract_dsr_v6(ctx, ip6, &tuple, l4_off, &addr, &port, &dsr); | ||
if (IS_ERR(ret)) | ||
goto drop_err; | ||
if (!dsr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this happen at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, totally unexpected - same as for IPv4 (where I added a comment).
/* Be paranoid about leaking info back to the main path: */ | ||
ctx_store_meta(ctx, CB_PORT, 0); | ||
ctx_store_meta(ctx, CB_ADDR_V4, 0); | ||
ret = nodeport_extract_dsr_v4(ctx, ip4, &tuple, l4_off, &addr, &port, &dsr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is to call this? (The question is - is this ok to call this here in favor of mirroring v4/v6 code, or should we still use the CB to pass info in IPv4 case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope not too expensive. I think we should (1) keep IPv4/IPv6 in sync, and (2) not use the CB at all for such tail-call chains (given that we're already not fully owning it).
Would rather try to follow-up with by optimizing the first call (in nodeport_lb4()
), or passing the data via a per-CPU map.
|
The 1.26 on net-next test fails with flake #15455. Rerunning |
/test-1.26-net-next |
#22978 introduced a new pair of DSR-specific tail-calls that get called from
nodeport_lb*()
. These tail-calls apply some DSR handling, then callctx_skip_nodeport_set()
and return to the "program start" by tail-calling toCILIUM_CALL_IPV*_FROM_NETDEV
.Such tail-call chains in the nodeport code are responsible for restoring CB_SRC_LABEL, so that it's available again when eg.
bpf_host.c:tail_handle_ipv4()
wants to read it. But we missed doing so for the new DSR-Ingress tail-calls - and the IPv6 variant currently doesn't offer any free CB slot to transfer thesrc_label
over. So first stop using the CB for transferring DSR information, and then add code to restore the CB_SRC_LABEL.