Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loading tracepoints other than the ones needed by Falco #1376

Open
Molter73 opened this issue Oct 3, 2023 · 23 comments
Open

Allow loading tracepoints other than the ones needed by Falco #1376

Molter73 opened this issue Oct 3, 2023 · 23 comments
Labels
kind/feature New feature or request
Milestone

Comments

@Molter73
Copy link
Contributor

Molter73 commented Oct 3, 2023

Motivation

We used to depend on the generic way the libs used for attaching tracepoints in order to load our own directly to the syscalls we cared about. PR #1001 changed this behavior in favor of idempotency, (which I think is great!), but we would like to keep something similar to the previous behavior.

Feature

If tracepoints other than the ones needed by Falco are found in the eBPF probe, the libraries should attach them, since the most likely case is an adopter is trying to use a custom made driver.

Alternatives

We could add some additional means for control, similar to the existing SC system, but that might be overkill. ATM we also only build custom drivers for eBPF, we might do so for modern_bpf in the near future. We are not sure this is a worthy change for kernel modules, since from our testing the overhead of attaching to sys_enter/sys_exit is negligible compared to the bpf variants.

Additional context

An example implementation can be found in #1375, we will be using something similar in our fork of the libs.

@Molter73 Molter73 added the kind/feature New feature or request label Oct 3, 2023
@Andreagit97
Copy link
Member

Andreagit97 commented Oct 4, 2023

Still need to find time to think about it, I will come back soon! Sorry

@FedeDP
Copy link
Contributor

FedeDP commented Oct 6, 2023

If tracepoints other than the ones needed by Falco are found in the eBPF probe, the libraries should attach them, since the most likely case is an adopter is trying to use a custom made driver.

Mmmh given the schema (ie: event_table) is still the same, i expect the loaded driver to push events that libscap knows about, right?
That's why i'd love to allow people to attach directly to the syscalls specific tracepoint, bypassing sys_enter/exit hooks. This is possibly a bigger change, but would also carry some big benefits.
As i already shared on the slack channel, i envisioned this feature like a single boolean option for the {bpf,modern,kmod} engines, then all the logic should be buried inside.
This means that a request to enable PPM_SC_OPENAT would directly attach the sys_openat tracepoint, instead of flipping a bit in the interesting syscalls map.
It also means that all the initialization work being done by sys_enter/exit should be somehow replicated somewhere else, likely we would want to attach an init_filler as starting hook for all tracepoints, and then jump to the syscall-specific one, but i am not sure.
I planned to give this a shot ~1yr ago, but never found the spare time to look into this!

TLDR: my 2c -> let's try to integrate the feature in the system in a smooth way, instead of adding a "workaround".

@Molter73
Copy link
Contributor Author

Molter73 commented Oct 6, 2023

If it helps, this is how we build our custom ebpf driver that attaches directly to syscalls, we wrote our own probe.c file and used driver/bpf/ as an include directory:
https://github.com/stackrox/collector/tree/master/kernel-modules/probe

We basically made it so the same code that is run by sys_enter/sys_exit is added to every syscall. We also had to use regular tracepoints, but I can't recall why that was needed. @Stringy is the mastermind behind this so he can probably add some more context if needed (he is also gonna try to do a similar thing with the modern probe in the near future).

I agree that a proper solution is better than a workaround, but I'm pushing for stackrox to drop our fork and not being able to attach tracepoints directly is a hard stopper, we can try to carry the patch I shared for the short-term but I'd like us to drop it as soon as humanly feasible. Let me know if we can help with any sort of testing.

@Andreagit97
Copy link
Member

Hi Mauro, I finally found some time to think about the issue and the related PR, sorry for the delay. Looking at the code, the patch is fine, the only downside that I see is that we lose strict control over what is attached to the kernel. If in the future we add/rename some programs we need to be careful to not fall in the final loop that attaches everything, otherwise is possible that some programs will be injected automatically without noticing them.

Losing this strict check on what we inject is not a big issue but we are taking a step back with respect to what we chose some months ago. Moreover, we are adding dead code in the upstream repo, again the patch is small, but it is always dead code... my suggestion here is to try to keep this change in the fork until it exists, and then when it's time to remove the fork, if this is the only patch left pending we can think about integrating it upstream, WDYT?

@Andreagit97 Andreagit97 added this to the TBD milestone Oct 18, 2023
@erthalion
Copy link
Contributor

erthalion commented Oct 20, 2023

It's an important topic, so I allow me to add my 2c to the discussion.

I think everyone agrees that working on sys_enter/sys_exit level might be too
broad for some use cases, and having more flexibility around this is positive
for the project. Conceptually it implies there should be an interface that
describes what to load into the kernel, which programs, to which tracepoints,
etc. Elf sections in the bpf probe seems to be a good fit for that: it provides
enough isolation (one can swap the probe, and everything still works); it's
flexible enough; other projects are following this approach (e.g. libbpf with
all those rules about modifying the default behaviour via the section name).
From this point of view I believe this PR is a step in the right direction.

It certainly makes sense to be able to distinguish one type programs from the
other (and in fact it's even sufficient), to avoid situations like you've
described:

If in the future we add/rename some programs we need to be careful to not
fall in the final loop that attaches everything, otherwise is possible that
some programs will be injected automatically without noticing them.

There are already various program name prefixes in play, I think this could be
addressed by distinguishing programs in question with something like
custom/<...> or via introducing any other type of hierarchy that will make it
clear what has to be attached automatically, and what's not (again, hello
libbpf approach).

Moreover, we are adding dead code in the upstream repo, again the patch is
small, but it is always dead code

This strikes me as an interesting argument, so I want to spend few words here.
What we work on here is a library, and if someone is using the code, it's
certainly not dead, even if it's not exercised in the main project CI :)

@Andreagit97
Copy link
Member

ei @erthalion these are all good points! I agree that adding custom prefixes like custom/<...> could avoid some troubles in the future... what I was trying to say with my previous comment is that I've always seen libscap + drivers as a unique block, so for sure, we offer tracing capabilities to a userspace program that wants to use the libraries but I'm not sure we ever intended libscap as a custom loader for tracing program...IMO there is a big difference between libscap and libbpf.

  • libscap offers some APIs to trace your system with predefined hooks, these hooks are controlled by ppm_sc codes.
  • libbpf is a loading library so of course you are free to attach whatever program in the kernel.

According to the current libscap API, a BPF program should always be paired with a PPM_ code, or at least related in some way. This relation allows users to control what to attach/detach at loading time. Now, adding the capability to attach custom programs that match our event SCHEMA is for sure a nice feature but it's not clear to me how to control this custom loading logic userspace side... Today if you want to attach/detach a specific program you should provide the PPM_ code to libscap and the program will be correctly attached/detached. It is not clear to me how to do it with the proposed changes, everything that is found in the elf is injected into the kernel without control. Now probably I've used the wrong expression we are adding dead code in the upstream repo, sorry for that, this is not dead code but this is something that doesn't follow this PPM_SC <---> BPF program relation, and I'm not sure how users can take advantage of it if we don't provide an API to regulate it.

Moreover, this absence of control scares me a little bit because it could cause scenarios where users attach custom programs and face issues in libsinsp/libscap. In some way with this patch, we are saying "We allow you to attach everything that is in the elf file" but indeed we can just support programs that maintain our event schema and that don't overlap the "standard" collection we provide in the upstream driver... So I agree this could be a nice feature but I would like to have more control over what is attached and how it interacts with the userspace side

@erthalion
Copy link
Contributor

Thanks for clarifications, @Andreagit97 !

this is something that doesn't follow this PPM_SC <---> BPF program relation,
and I'm not sure how users can take advantage of it if we don't provide an
API to regulate it.

I see your point. This is probably one part of misunderstanding, the main goal
here is to provide more flexibility for those BPF progs that actually follow
the PPM_SC to BPF relation and speak Falco "protocol" fluently. The use case we
intend to support is about narrowing down the scope.

Moreover, this absence of control scares me a little bit because it could
cause scenarios where users attach custom programs and face issues in
libsinsp/libscap. In some way with this patch, we are saying "We allow you to
attach everything that is in the elf file" but indeed we can just support
programs that maintain our event schema and that don't overlap the "standard"
collection we provide in the upstream driver... So I agree this could be a
nice feature but I would like to have more control over what is attached and
how it interacts with the userspace side

For me to understand, what kind of control you have in mind? Can you describe
what are the expectations of it being applied to user space <-> kernel space
communication? And what are the consequences of not having such control?

From what I see the worst what could happen is that there is a BPF program,
that can't interact with Falco, but still got loaded, causing issues -- this is
probably on the BPF program developers, the community doesn't have to support
such cases.

As a side note, it's probably also a question of project developers position.
Surely, there are questions about how to make XYZ the best way possible, and
it's not always clear right away how to achieve that. But prohibitive from the
start approach will rarely provide a good ground to develop this XYZ further :)

@Andreagit97
Copy link
Member

For me to understand, what kind of control you have in mind? Can you describe
what are the expectations of it being applied to user space <-> kernel space
communication? And what are the consequences of not having such control?

Let's consider an example.
If I remember well your use case, you allow attaching single syscalls instead of the generic sys_enter/sys_exit tracepoints. Let's suppose I'm a user and I want to use this cool feature, I would like to attach only the open syscall without using sys_enter/sys_exit tracepoints. I expect a sort of userspace interface that allows me to attach only the necessary tracepoints (sys_enter_open, sys_exit_open), something like a boolean single_syscalls=true, but today we have nothing to do that so what happens is that sys_enter and sys_exit will be attached under the hood. Probably as a libsinsp user, I could detach sys_enter and sys_exit manually, but then, what do I have to do? How many trace points are attached now in the kernel? which are the tracepoints attached? do I have only sys_enter_open and sys_exit_open or maybe I have also other tracepoints like sys_enter_setuid, and sys_exit_setuid because libscap found them in the elf file? How can I disable them at run-time?

Here are some points I would like to discuss

  1. How a user can take advantage of this new feature in a clear way?
    I know this could be a step 0 for something but I would like to have a clear plan before starting with possible steps in this direction. Moreover, with the proposed solution, we inject all programs we find in the elf but what about if we want only some of them, like in the proposed example? This is the "control" I was talking about in the previous message.

  2. Merging Mauro's PR would mean that we want to "officially" support custom drivers and not only the ones that live in this repo. Today the patch is elegant and small, but what about the future? we need to take it into consideration in all the following designs and this could have a not negligible maintenance cost. Maybe is something that we want but we need to be sure about the choice otherwise there is the risk that we break the support at every refactor. Moreover, the modern probe today is directly embedded into libscap, so the possibility of integrating an external skeleton inside the libscap becomes even more complex.

  3. As we previously said, the possible custom drivers must be compatible with the event SCHEMA and the logic of the libscap engine. Since these are quite strict requirements I don't expect many adopters to patch the drivers with different programs injected, right now the only use case we know is your use case. So maybe we could also evaluate integrating your use case in the upstream drivers... integrating it, will force the creation of an explicit logic more resilient to future changes/refactors, it will probably require an initial big effort, but maybe in the future, it could help us in maintaining the solution instead of having flying little patches.

These are just my thoughts, if other maintainers agree with the proposed changes I'm fine, I would only like to understand what is the plan here.

@FedeDP
Copy link
Contributor

FedeDP commented Oct 25, 2023

Jumping in once again just to 👍 everything Andrea said; most of the points were already mentioned by me in my first comment.
IMHO for me the solution is:

So maybe we could also evaluate integrating your use case in the upstream drivers... integrating it, will force the creation of an explicit logic more resilient to future changes/refactors, it will probably require an initial big effort, but maybe in the future, it could help us in maintaining the solution instead of having flying little patches.

Keep the proposed patch fork-only until we have a proper feature that would allow you to throw the patch away.
I very much prefer a good solution that might be useful for everyone (as you know, if user is only interested in a bunch of syscalls, attaching them directly has great perf benefits) than a workaround for your use case (it's not like because this is your use case, is more like "this is a workaround for a use case we don't support, so why a workaround in the first place?").

@incertum
Copy link
Contributor

Quoting @erthalion

I think everyone agrees that working on sys_enter/sys_exit level might be too
broad for some use cases, and having more flexibility around this is positive
for the project.

As a side note, it's probably also a question of project developers position.
Surely, there are questions about how to make XYZ the best way possible, and
it's not always clear right away how to achieve that. But prohibitive from the
start approach will rarely provide a good ground to develop this XYZ further :)

I have put this issue on the Dec 2023 core maintainers strategy discussion.

Reviewing the project's history, it appears that iterations have always been necessary. Therefore, we should not abruptly impose a different standard for @erthalion and @Molter73 but rather work towards a solution that addresses the concerns of all parties involved while also promoting innovation and iterative progress. As long as we are not breaking Falco's default behavior, I see no problems with iterating, especially since @erthalion and @Molter73 are committed to steering us toward an optimal solution. By the way, I am also very interested in such enhanced customizations.

@FedeDP
Copy link
Contributor

FedeDP commented Nov 20, 2023

I think this

but rather work towards a solution that addresses the concerns of all parties involved while also promoting innovation and iterative progress

is what i asked for; i mean, if we could find a good solution that feels less like a "workaround" and more like an enablement for a new feature, i am all for it!

@incertum
Copy link
Contributor

Great @FedeDP -- @erthalion and @Molter73 what are concrete next steps based on the WIP PR that is already open?

@erthalion
Copy link
Contributor

Great @FedeDP -- @erthalion and @Molter73 what are concrete next steps based
on the WIP PR that is already open?

Thanks for chiming in, @incertum!

The way I see it, we still need to agree on one important point before moving
forward: I think even a "less workaround" solution will imply having more
responsibility on the bpf probe (e.g. about negotiation which progs to attach
where), because otherwise chances are low that the implementation will be
flexible enough. My takeaway from the discussion is that folks are strongly
against that, and to not waste anyone's time we need to clarify if such
approach (well thought and thoroughly developed) is not going to be frowned
upon.

Otherwise, I think @Stringy has some ideas about how to proceed and shape up
this feature with your help.

FYI, Mauro is not going to be available for the next few months, so I'm afraid
we have to continue without his input.

@Stringy
Copy link
Contributor

Stringy commented Nov 22, 2023

Hi folks, I've been watching this thread with bated breath for some time now, and I think it's time to weigh in a little. I've developed this feature for eBPF and that has served us well for quite some time now, and I'm in the process of working on a solution for modern_bpf in our fork (admittedly both built in a way that is very specific to our use-case.)

For me the secret sauce is the event schema. Each of my implementations has been focused primarily on getting to a point where I can tail call into the relevant programs so the data is processed and written to the buffers in the right way. In the case of eBPF this was purely a driver change (because the lifecycle loading/unloading of the driver is already very flexible) and in modern_bpf this has required driver changes and libpman changes to support a modified skeleton (as a brief aside it is curious to me that modern_bpf already supports alternative skeletons via USE_BUNDLED_MODERN_BPF and MODERN_BPF_SKEL_DIR though as it stands currently it will only compile if that skeleton matches the existing one)

I agree with @erthalion that a proper solution will most likely involve shifting some ownership of a custom driver's internals back to the "user" (provided the schema is followed.) Then the API becomes an extension to the lifecycle management of the driver (in effect an extensible libpman) in which the user manages what is loaded and attached (via callback, or vtable or whatever). Provided the schema is followed in the driver, libscap should be happy. This might be a good first step, to dip our toes into the feature, but comes with the agreed caveat that thou shalt bring thine own driver.

Making the drivers flexible to this customisation feels a fair bit more difficult, not least because of the common initialization, the very specific function signatures (as defined by the tracepoints themselves), and the risk of verifier complaint. For my customisation of the modern_bpf driver (which is not yet finished) I have experimented with changing the relevant programs to BPF_KSYSCALL and modifying the section names accordingly, and that seems to work, but these changes are very difficult to manage in a truly generic solution. If customisation at this level is possible, then its use in conjunction with the lifecycle management changes would feel like a fully supported solution, at least in my estimation.

@incertum
Copy link
Contributor

Thanks @erthalion and @Stringy! Thinking more about it: Could we open a proposal re the bigger vision of driver extensions and customizations? Let me post some more ideas re how such a proposal could be structured after the Thanksgiving break. We could treat this and the userspace anomaly detection work I am soon more seriously starting as similar workflows in the sense that lots of iterations will be required and we don't know all the answers beforehand.

In addition, perhaps we could brainstorm during one of the upcoming community calls? We can discuss what date would work best for everyone who wishes to participate in the discussion.

@incertum
Copy link
Contributor

incertum commented Nov 30, 2023

Reread all comments and in addition to opening a new "vote" issue I also posted some concrete suggestions in the draft PR #1375 (comment)


Addressing a follow-up concern raised by Andrea in #1376 (comment), which aligns with Federico's concern in #1376 (comment):

An "initial workaround" solution is actual preferable I think until we have a clearer understanding of whether these capabilities are solely intended for (1) libs only adopters or (2) Falco adopters as well. We could postpone the decision on Falco UX and broader libs control internals (such as expanding the SC system or exploring alternative approaches) until later. That way we will effectively minimize the amount of code that needs to be refactored in the future and we can achieve faster and easier iterations (as also suggested by Mauro in his initial comment here #1376 (comment)).

While we're currently focusing on syscall monitoring, we shouldn't put off dealing with the LSM hooks that will still be important in the future (I posted similar thoughts many many times before).

Therefore, let's take the time to work together on a broader vision proposal without blocking this work in the meantime? See my last comment #1376 (comment).

Who would want to take the lead for such a proposal?

@mstemm
Copy link
Contributor

mstemm commented Nov 30, 2023

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

@incertum
Copy link
Contributor

incertum commented Nov 30, 2023

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

Identical schema for the initial use case of one of our libs adopters (what we discuss here), see #1376 (comment) @Stringy comment:

Provided the schema is followed in the driver, libscap should be happy.

For a larger refactor and flexible solution that will be future proof for all adopters and use cases and scenarios, including Falco, lets go the proposal route and also include LSM hook support in there.

@mstemm
Copy link
Contributor

mstemm commented Nov 30, 2023

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

Identical schema for the initial use case of one of our libs adopters (what we discuss here), see #1376 (comment) @Stringy comment:

Ok, I wonder how that would interact with libsinsp, then. libsinsp and its parsers expect a pretty wide set of syscalls to reconstruct its internal state accurately and make that state available via filterchecks and filters. If an alternate bpf program only handled some syscalls (for example opens + execs but not, say, connects and listens), wouldn't using libsinsp be problematic?

(Again, I'm catching up and don't totally understand the use case yet).

@incertum
Copy link
Contributor

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

Identical schema for the initial use case of one of our libs adopters (what we discuss here), see #1376 (comment) @Stringy comment:

Ok, I wonder how that would interact with libsinsp, then. libsinsp and its parsers expect a pretty wide set of syscalls to reconstruct its internal state accurately and make that state available via filterchecks and filters. If an alternate bpf program only handled some syscalls (for example opens + execs but not, say, connects and listens), wouldn't using libsinsp be problematic?

(Again, I'm catching up and don't totally understand the use case yet).

No worries at all, this should not be of concern here as libs adopters who mess with these parts of the code base should know what they are doing 😉 If they want to break things they can. Also note that with the new base_syscalls Falco option you can also break Falco if you want, your decision.

@leogr
Copy link
Member

leogr commented Dec 12, 2023

Also note that with the new base_syscalls Falco option you can also break Falco if you want, your decision.

A bit OT, but still relevant, IMO: This is not necessarily good for users, especially if an option allows them to break things in unexpected ways that will make users waste an infinite amount of time troubleshooting. Don't get me wrong, I was in favor of base_syscalls (and I'm still in), but generally speaking, we should only allow users to break things in a controlled way (if they like to do so) rather than freely allow them to do everything they want.

@poiana
Copy link
Contributor

poiana commented Mar 11, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Molter73
Copy link
Contributor Author

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants