New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loader: refactor replaceDatapath to loadDatapath #32518
Conversation
/test |
c636496
to
38d44d4
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating loading from attachment is great, but does this split at the right point? As you rightfully point out, inserting into the tail call maps ends up working like attachment. Can we also move map insertion out of this function?
38d44d4
to
a6673b1
Compare
/test |
Drafted due to rebasing onto #32059. |
a6673b1
to
cf8ba69
Compare
/test |
cf8ba69
to
ae7ea76
Compare
/test |
This unblocks cilium#29333. See cilium#32468 for more context. This commit refactors replaceDatapath() to loadDatapath() by factoring device attachment out of the function into the caller. The main reasons are flexibility and transparency. replaceDatapath() was called from many places and needed to do a lot. This change is the first step to handing individual callers an object representing actual bpf object handles, so they can correctly manage its lifecycle. In the future, ebpf.LoadAndAssign will be used for better readability. Some callers attach the same program to multiple interfaces, some attach multiple programs (ingress/egress) to the same interface, and some use a mixture of both. This has caused loops to creep into replaceDatapath, giving it many arguments and many overall responsibilities, making it hard to form intuition around. Major changes made in this commit: - lifted attach{SKB,XDP}Program out of the function, into all callers, making them call attach* methods explicitly - removed `replaceDatapathOptions` - reduced the window during which a 'revert' can happen (see code comments) due to the risks involved and its questionable effectiveness - removed a few points where context cancellations are obeyed, to be continued in a subsequent commit Fixes: cilium#32468 Signed-off-by: Timo Beckers <timo@isovalent.com>
Loading bpf objects used to be done by iproute2, where propagating ctx to the exec.Cmd invocations made sense, since realistically any shellout can hang for arbitrary reasons. Now the loader is fully hosted in the agent process, this no longer makes sense. Once we're blocked in a bpf() syscall, e.g. for loading a program, the verifier can be interrupted by sending a signal to the calling thread. Since the Go runtime routinely sends these signals under normal operation, ebpf-go will retry a few times when bpf() returns EINTR. The API currently doesn't expose a way to cancel program loading/verification, and there's no clear benefit to doing so in the first place. Verification is relatively lightweight compared to datapath compilation, so interrupting it during teardown is of questionable benefit. The agent doesn't expect it to be interruptible, it's bound to leave endpoints in an undefined state. This commit introduces the assumption that, once endpoint loading/attachment is kicked off (after compilation), it cannot be cancelled. This is reflected in the interface exposed to the rest of the system, by removing the ctx parameter on many methods. Only compilation can be interrupted, since it can take a long time on some systems, especially lower-spec. Signed-off-by: Timo Beckers <timo@isovalent.com>
ae7ea76
to
1188eda
Compare
/test |
Fixes: #32468