-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow loading tracepoints other than the ones needed by Falco #1376
Comments
Still need to find time to think about it, I will come back soon! Sorry |
Mmmh given the schema (ie: event_table) is still the same, i expect the loaded driver to push events that libscap knows about, right? TLDR: my 2c -> let's try to integrate the feature in the system in a smooth way, instead of adding a "workaround". |
If it helps, this is how we build our custom ebpf driver that attaches directly to syscalls, we wrote our own We basically made it so the same code that is run by I agree that a proper solution is better than a workaround, but I'm pushing for stackrox to drop our fork and not being able to attach tracepoints directly is a hard stopper, we can try to carry the patch I shared for the short-term but I'd like us to drop it as soon as humanly feasible. Let me know if we can help with any sort of testing. |
Hi Mauro, I finally found some time to think about the issue and the related PR, sorry for the delay. Looking at the code, the patch is fine, the only downside that I see is that we lose strict control over what is attached to the kernel. If in the future we add/rename some programs we need to be careful to not fall in the final loop that attaches everything, otherwise is possible that some programs will be injected automatically without noticing them. Losing this strict check on what we inject is not a big issue but we are taking a step back with respect to what we chose some months ago. Moreover, we are adding dead code in the upstream repo, again the patch is small, but it is always dead code... my suggestion here is to try to keep this change in the fork until it exists, and then when it's time to remove the fork, if this is the only patch left pending we can think about integrating it upstream, WDYT? |
It's an important topic, so I allow me to add my 2c to the discussion. I think everyone agrees that working on It certainly makes sense to be able to distinguish one type programs from the
There are already various program name prefixes in play, I think this could be
This strikes me as an interesting argument, so I want to spend few words here. |
ei @erthalion these are all good points! I agree that adding custom prefixes like
According to the current Moreover, this absence of control scares me a little bit because it could cause scenarios where users attach custom programs and face issues in libsinsp/libscap. In some way with this patch, we are saying "We allow you to attach everything that is in the elf file" but indeed we can just support programs that maintain our event schema and that don't overlap the "standard" collection we provide in the upstream driver... So I agree this could be a nice feature but I would like to have more control over what is attached and how it interacts with the userspace side |
Thanks for clarifications, @Andreagit97 !
I see your point. This is probably one part of misunderstanding, the main goal
For me to understand, what kind of control you have in mind? Can you describe From what I see the worst what could happen is that there is a BPF program, As a side note, it's probably also a question of project developers position. |
Let's consider an example. Here are some points I would like to discuss
These are just my thoughts, if other maintainers agree with the proposed changes I'm fine, I would only like to understand what is the plan here. |
Jumping in once again just to 👍 everything Andrea said; most of the points were already mentioned by me in my first comment.
Keep the proposed patch fork-only until we have a proper feature that would allow you to throw the patch away. |
Quoting @erthalion
I have put this issue on the Dec 2023 core maintainers strategy discussion. Reviewing the project's history, it appears that iterations have always been necessary. Therefore, we should not abruptly impose a different standard for @erthalion and @Molter73 but rather work towards a solution that addresses the concerns of all parties involved while also promoting innovation and iterative progress. As long as we are not breaking Falco's default behavior, I see no problems with iterating, especially since @erthalion and @Molter73 are committed to steering us toward an optimal solution. By the way, I am also very interested in such enhanced customizations. |
I think this
is what i asked for; i mean, if we could find a good solution that feels less like a "workaround" and more like an enablement for a new feature, i am all for it! |
Great @FedeDP -- @erthalion and @Molter73 what are concrete next steps based on the WIP PR that is already open? |
Thanks for chiming in, @incertum! The way I see it, we still need to agree on one important point before moving Otherwise, I think @Stringy has some ideas about how to proceed and shape up FYI, Mauro is not going to be available for the next few months, so I'm afraid |
Hi folks, I've been watching this thread with bated breath for some time now, and I think it's time to weigh in a little. I've developed this feature for eBPF and that has served us well for quite some time now, and I'm in the process of working on a solution for modern_bpf in our fork (admittedly both built in a way that is very specific to our use-case.) For me the secret sauce is the event schema. Each of my implementations has been focused primarily on getting to a point where I can tail call into the relevant programs so the data is processed and written to the buffers in the right way. In the case of eBPF this was purely a driver change (because the lifecycle loading/unloading of the driver is already very flexible) and in modern_bpf this has required driver changes and libpman changes to support a modified skeleton (as a brief aside it is curious to me that modern_bpf already supports alternative skeletons via I agree with @erthalion that a proper solution will most likely involve shifting some ownership of a custom driver's internals back to the "user" (provided the schema is followed.) Then the API becomes an extension to the lifecycle management of the driver (in effect an extensible libpman) in which the user manages what is loaded and attached (via callback, or vtable or whatever). Provided the schema is followed in the driver, libscap should be happy. This might be a good first step, to dip our toes into the feature, but comes with the agreed caveat that thou shalt bring thine own driver. Making the drivers flexible to this customisation feels a fair bit more difficult, not least because of the common initialization, the very specific function signatures (as defined by the tracepoints themselves), and the risk of verifier complaint. For my customisation of the modern_bpf driver (which is not yet finished) I have experimented with changing the relevant programs to |
Thanks @erthalion and @Stringy! Thinking more about it: Could we open a proposal re the bigger vision of driver extensions and customizations? Let me post some more ideas re how such a proposal could be structured after the Thanksgiving break. We could treat this and the userspace anomaly detection work I am soon more seriously starting as similar workflows in the sense that lots of iterations will be required and we don't know all the answers beforehand. In addition, perhaps we could brainstorm during one of the upcoming community calls? We can discuss what date would work best for everyone who wishes to participate in the discussion. |
Reread all comments and in addition to opening a new "vote" issue I also posted some concrete suggestions in the draft PR #1375 (comment) Addressing a follow-up concern raised by Andrea in #1376 (comment), which aligns with Federico's concern in #1376 (comment): An "initial workaround" solution is actual preferable I think until we have a clearer understanding of whether these capabilities are solely intended for (1) libs only adopters or (2) Falco adopters as well. We could postpone the decision on Falco UX and broader libs control internals (such as expanding the SC system or exploring alternative approaches) until later. That way we will effectively minimize the amount of code that needs to be refactored in the future and we can achieve faster and easier iterations (as also suggested by Mauro in his initial comment here #1376 (comment)). While we're currently focusing on syscall monitoring, we shouldn't put off dealing with the LSM hooks that will still be important in the future (I posted similar thoughts many many times before). Therefore, let's take the time to work together on a broader vision proposal without blocking this work in the meantime? See my last comment #1376 (comment). Who would want to take the lead for such a proposal? |
I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace. |
Identical schema for the initial use case of one of our libs adopters (what we discuss here), see #1376 (comment) @Stringy comment:
For a larger refactor and flexible solution that will be future proof for all adopters and use cases and scenarios, including Falco, lets go the proposal route and also include LSM hook support in there. |
Ok, I wonder how that would interact with libsinsp, then. libsinsp and its parsers expect a pretty wide set of syscalls to reconstruct its internal state accurately and make that state available via filterchecks and filters. If an alternate bpf program only handled some syscalls (for example opens + execs but not, say, connects and listens), wouldn't using libsinsp be problematic? (Again, I'm catching up and don't totally understand the use case yet). |
No worries at all, this should not be of concern here as libs adopters who mess with these parts of the code base should know what they are doing 😉 If they want to break things they can. Also note that with the new |
A bit OT, but still relevant, IMO: This is not necessarily good for users, especially if an option allows them to break things in unexpected ways that will make users waste an infinite amount of time troubleshooting. Don't get me wrong, I was in favor of |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
Motivation
We used to depend on the generic way the libs used for attaching tracepoints in order to load our own directly to the syscalls we cared about. PR #1001 changed this behavior in favor of idempotency, (which I think is great!), but we would like to keep something similar to the previous behavior.
Feature
If tracepoints other than the ones needed by Falco are found in the eBPF probe, the libraries should attach them, since the most likely case is an adopter is trying to use a custom made driver.
Alternatives
We could add some additional means for control, similar to the existing SC system, but that might be overkill. ATM we also only build custom drivers for eBPF, we might do so for modern_bpf in the near future. We are not sure this is a worthy change for kernel modules, since from our testing the overhead of attaching to sys_enter/sys_exit is negligible compared to the bpf variants.
Additional context
An example implementation can be found in #1375, we will be using something similar in our fork of the libs.
The text was updated successfully, but these errors were encountered: