Refactoring for image based gadgets #2512

flyth · 2024-02-16T21:59:30Z

In our teams' efforts to quickly move forward with the "image based gadgets" feature/architecture (which is really a great achievement!), many things have been built into an interface that was previously used for all other single gadgets - basically all of the functionality has landed in a gadget simply called "run". Yet, plans for the image based gadgets have outgrown the capabilities of that interface: in the future, those gadget images should for example be able to extract multiple event streams at the same time (which previously would have required two separate gadget instances) that then can be consumed in different ways.

Also, while we previously knew which gadget expected which kind of enrichment, it was easy to just actively extend our Go data structures and wire the gadgets up in a way that they could get the information needed. We use operators for that, which check for certain interface implementations on the gadgets itself and then activate themselves.
With the image based gadgets, however, we don't have those clear Go types anymore - instead, we try to mostly infer data layouts from BTF and then we manually leverage operators depending on the outcome of the inference and simply reserve space for all the information that could be added. This has added quite a lot of direct dependencies to the run gadget that are not necessarily required all the time.

This refactoring addresses a couple of things:

it makes IG more modular again
- image based gadgets are evaluated by their layers (ebpf, wasm, ...) and respective operators will handle them
- operators don't need to know about each other again
data will be sent/received through completely abstract DataSources instead that operators subscribe to to extend, modify or consume data
Fields inside DataSources can be addressed using names and tags
multiple DataSources per gadget are possible with the refactoring, which addresses handling multiple data streams emitted or polled from the gadget
using operators as sinks, they can consume data and forward it to for example external services or output them to the CLI or a prometheus backend
the underlying data can be efficiently serialized through protocol buffers (instead of JSON, which is used right now)
DataSources (or even their fields) can be addressed and explicitly requested; if a field is not requested by the user/client, operators handling it can choose to not fill or enrich it in the first place, thus saving CPU & bandwidth
it makes it much easier to provide new user-space helpers like converters
the new interfaces for the DataSources should be implemented for WASM at a later stage, to be able to provide the same level of access in user supplied code
it simplifies the Go API to call gadgets and makes it easier to phase out legacy code later on
prepares IG to be able to handle other planned features more easily
...tba

This is still a draft and still needs a bit of polishing, especially around documentation & testing.

I will add more information and the open todo items over the next couple of days.

Data flow

sequenceDiagram
    participant client as Client
    participant grpc as gRPC Runtime
    participant lctx as Local GadgetContext
    participant cli as CLI Operator
    participant svc as gRPC Service
    participant oci as OCI Handler
    participant ctx as GadgetContext
    participant ebpf as eBPF Operator
    participant mgr as KubeManager
    participant fmt as Formatter
    box Client
        participant client
        participant grpc
        participant lctx
        participant cli
    end
    box Server
        participant svc
        participant oci
        participant ctx
        participant ebpf
        participant mgr
        participant fmt
    end
    client ->> grpc: Call run-experimental
    grpc ->> svc: RunOCIGadget(url)
    grpc ->> lctx: Create New
    svc ->> oci: RunOCIGadget(url)
    Note over oci: ensure gadget is available<br/>download if necessary<br/>look at layers and call operators
    oci ->> ctx: Create New
    Note over oci: find handler for layer (eBPF in this example)
    oci ->> ebpf: Instantiate()
    Note over ebpf: load image<br/>parse BTF<br/>populate everything<br/>create DataSources
    oci ->> mgr: Instantiate()
    Note over mgr: iterate over gadgetCtx().DataSources()<br/>if mntnsid found, add fields to that DataSource<br/>subscribe to that DataSource (unused though in GetGadgetInfo())
    Note over mgr: gets FieldAccessors and caches them
    mgr ->> ctx: add fields / subscriptions
    oci ->> fmt: Instantiate()
    Note over fmt: iterate over gadgetCtx().DataSources()<br/>if interesting fields found, adjust DataSource<br/>subscribe to that DataSource (unused)
    Note over fmt: gets FieldAccessors and caches them
    fmt ->> ctx: add fields / subscriptions
    Note over oci: serialize GadgetInfo from GadgetContext
    oci ->> svc: Send GadgetInfo
    svc ->> grpc: 
    grpc ->> lctx: Deserialize GadgetInfo
    Note over lctx: create DataSources
    grpc ->> cli: init
    Note over cli: creates Columns helpers from DataSources
    Note over cli: iterate over gadgetCtx().DataSources()<br/>subscribes
    cli ->> lctx: subscribe
    oci ->> ctx: subscribe to DataSources as sink (last in chain)
    oci ->> ebpf: Start()
    Note over ebpf: installs programs<br/>installs readers
    oci ->> mgr: Start()
    oci ->> fmt: Start()
    Note over ebpf: read data from buffer<br/>create new Data and use cached FieldAccessors to fill in data
    ebpf ->> ctx: EmitAndRelease() data
    ctx ->> mgr: forward Data to subscriber
    Note over mgr: uses cached FieldAccessors to read and write data
    ctx ->> fmt: forward Data to subscriber
    Note over fmt: uses cached FieldAccessors to read and write data
    ctx ->> oci: forward Data to subscriber
    Note over oci: serialize to protobuf
    oci ->> svc: forward GadgetEvent
    svc ->> grpc: 
    Note over grpc: deserialize from protobuf
    grpc ->> lctx: EmitAndRelease() data
    lctx ->> cli: forward to subscriber
    Note over cli: output

Todos

JSON output to include all fields (not a big fan as this introduces new exceptions)
remove viper (but might move that out of the PR if that's okay for now)
update docs / workflow chart

Changes to the run gadget implemented while this PR is/was open and should be addressed here or in a follow-up PR. Feel free to add.

eiffel-fl

Hi!

I have some comments on the first commit.
I will take a deeper look to get a broader understanding later, nonetheless:

a mermaid diagram would be welcomed to help understanding the links between the different structures.
this PR adds 6000 lines of code, I understand protobuf generated code accounts for a big part of it but I really want a follow-up PR removing plenty of code.
this PR has 24 commits, some of them add helpers which would have been already helpful and could have been merged before. I am wondering if you should not split it.

Best regards.

pkg/oci/oci.go

pkg/datasource/accessors.go

pkg/datasource/columns.go

pkg/datasource/data.go

eiffel-fl · 2024-02-19T08:33:23Z

pkg/datasource/data.go

+		ds.fieldMap[f.Name] = (*field)(f)
+	}
+	ds.registerPool()
+	return ds, nil


You never return an error, so drop this unused error.

More tests could be added here that return errors when loading the untrusted "user" data here - I'd rather leave the error in to avoid refactoring the callers, but I'll add a TODO.

to avoid refactoring the callers

gopatch can handle this.
You are already refactoring, so let's merge correct code and only what we need.

to avoid refactoring the callers

gopatch can handle this. You are already refactoring, so let's merge correct code and only what we need.

I don't agree with this. Why to have to change this in the future if this is planned to return errors later on? It's better to define the right function signature from the beginning.

This was that I meant: you are already doing a refactoring here, so let's focus on refactor right now, instead of adding refactoring as future work.

Unresolving as the error is still present but never used!

From my understanding we agreed to keep it this way to not have to change the function signature later on and have to edit all the callers.

pkg/datasource/data.go

alban

Some initial comments.

I think part of this PR can be added progressively with separate PRs, such as:

changes to btfhelpers
changes on CString

There are probably others... then, this PR will be a bit smaller.

Then, there are big blocks which could be split, such as:

Changes to the service grpc API
Introduction of the Layer operators
datasource.FieldAccessor

It seems they can be implemented, reviewed and merged one after another in this order.

cmd/common/oci.go

alban · 2024-02-19T10:01:29Z

cmd/common/oci.go

+			// Before running the gadget, we need to get the gadget info to perform
+			// different tasks like creating the parser and setting flags for the
+			// gadget's parameters.
+			actualArgs := cmd.Flags().Args()


Not related to this PR, but I think dynamic flags for gadget parameters are a mistake. I don't like that gadget parameters like --ignore-failed are mixed up with operators/localmanager flags like --host and with generic flags like --insecure.

$ sudo -E ./ig run trace_exec -h ... -h, --help help for run --host Show data from both the host and containers --ignore-failed Ignore failed events (default true) --insecure Allow connections to HTTP only registries

I would prefer if it was --param ignore-failed=true. It would make it a lot easier for the user to understand the category of the flag. And I suspect the implementation would be easier without the utils.ParseEarlyFlags.

Also, I'd like to have an inspect command that would show, among other things, the parameters:

$ sudo -E ./ig image inspect trace_exec

In this way, the --help command would never try to connect to internet.

Agree with the --param issue and also with having an inspect call to show all available information (from the accumulated gadget metadata).

In this way, the --help command would never try to connect to internet.

Why would it change behavior here? (I mean: which requests would be saved here?)

Why would it change behavior here? (I mean: which requests would be saved here?)

I don't understand the question.

What I meant is that when the user does not really understand how the CLI is supposed to be used and adds a --help, it does not print any help message and tries to download something from the internet. I find that confusing for users.

$ sudo -E ./ig run trace exec --help INFO[0000] Experimental features enabled Error: getting gadget info: getting gadget image: pulling image "trace": downloading to local repository: failed to resolve ghcr.io/inspektor-gadget/gadget/trace:latest: ghcr.io/inspektor-gadget/gadget/trace:latest: not found

Ah okay - so you mean the gadget doesn't need to be downloaded directly for that call with a --help at the end. I don't think that's possible, though - because operators also have their params and those can depend on the actual gadget - e.g. if the gadget has mountnsid, LocalManager or KubeManager needs to be loaded - and those also register params.
Or would you want those params also to move to the more generic solution?

Or would you want those params also to move to the more generic solution?

Yes, I would like that. Something like --operator-opt foo:bar, so the CLI flag --operator-opt is known without downloading anything. And the user could discover the possible operator params with the image inspect command.

Similarly, Docker has the flag --security-opt= which is also not fully described in --help because I think it is passed to some inner component:

--security-opt=seccomp:unconfined

--security-opt=no-new-privileges

--security-opt=label:disable

Hmm, if we go that route, we could also use our internal conventions (e.g. --param operators.localmanager.foo=abc). I'm certainly not against it, but it would make things a bit less straightforward than it was before for users.

Another thing I like about it, is that it can differentiate between local and remote operators; the cli operator for example is always a local running one, while most of the others are run on the server side. So cli could still register params directly, while remote params could be used through the detour. Hmmm.....

Not related to this PR, but I think dynamic flags for gadget parameters are a mistake. I don't like that gadget parameters like --ignore-failed are mixed up with operators/localmanager flags like --host and with generic flags like --insecure.

I think this could be fixed by #935.

$ sudo -E ./ig run trace_exec -h ... -h, --help help for run --host Show data from both the host and containers --ignore-failed Ignore failed events (default true) --insecure Allow connections to HTTP only registries

I would prefer if it was --param ignore-failed=true. It would make it a lot easier for the user to understand the category of the flag.

What about parameters that are introduced by operators? (--host, --namespace, --pod, etc?) Should we use a --op-param prefix too?

Also, I'd like to have an inspect command that would show, among other things, the parameters:

Do we really want to introduce another command for it? Isn't it better to have this under a well-known flag / or command like --help?
I was thinking we could have in the metadata data a help section that is shown when the user run ig run foo --help, this will allow the gadget developer to write a documentation specific for their gadget.

What I meant is that when the user does not really understand how the CLI is supposed to be used and adds a --help, it does not print any help message and tries to download something from the internet. I find that confusing for users.

I'd say that specific case not well handled currently. We can improve this message by indicating the gadget isn't valid.

(I'd suggest to move this discussion to a different issue to help focusing the review on specific changes introduced by this PR)

cmd/common/oci.go

pkg/btfhelpers/btfhelpers.go

pkg/gadget-service/api/api.proto

pkg/gadgets/helpers_all.go

flyth · 2024-02-19T13:17:17Z

You're both right - I will try to move all helpers to another PR.

a mermaid diagram would be welcomed to help understanding the links between the different structures.

Yeah, totally agree and this will follow soon.

but I really want a follow-up PR removing plenty of code.

This PR doesn't remove the run gadget - when we switch to this, the run gadget and some of its dependencies can be removed.

pkg/gadget-service/api/api.proto

mauriciovasquezbernal · 2024-02-19T14:49:02Z

This PR doesn't remove the run gadget - when we switch to this, the run gadget and some of its dependencies can be removed.

When do you think we should remove the run gadget? IMO it should be done in this same PR.

mauriciovasquezbernal

Some very initial comments. I still need to check most of it.

Just to be sure we're aligned, current status of this PR isn't working, right?
I tried it and all what I got was:

$ $ sudo -E ig run-experimental trace_open
INFO[0000] Experimental features enabled
WARN[0000] unknown type for variable "gadget_filter_by_mntns": Var:"gadget_filter_by_mntns"
WARN[0000] unknown type for variable "gadget_filter_by_mntns": Var:"gadget_filter_by_mntns"

pkg/operators/operators.go

pkg/operators/ebpf/ebpf.go

mauriciovasquezbernal · 2024-02-19T16:48:04Z

pkg/operators/ebpf/maps.go

+		tags:    nil,
+	}
+
+	i.vars[varName] = newVar


shouldn't it be i.maps?

pkg/operators/ebpf/ebpf.go

pkg/gadget-service/api/api.proto

pkg/gadget-service/api/consts.go

pkg/runtime/grpc/oci.go

mauriciovasquezbernal · 2024-02-19T20:29:34Z

pkg/operators/ebpf/ebpf.go

+		// },
+		{
+			prefixFunc: func(s string) (string, bool) {
+				// Exceptions for backwards-compatibility


backwards-compatibility with what?

gadget_filter_by_mntns is handled like a var in the operator, which is preferred (in here) to work with the prefix gadget_var_. In the end, it would be better to have the former be called gadget_var_filter_by_mntns to fit the concept. I'm open for discussions around that 😄

eiffel-fl

Hi!

Some comments, I am still reading it.

Best regards.

pkg/datasource/accessors.go

pkg/datasource/data.go

pkg/gadget-context/gadget-context.go

pkg/operators/ebpf/struct.go

pkg/operators/ebpf/attach.go

eiffel-fl

Hi!

Here are some other comments.
I will take a deeper look later and I will particularly focus on getting a big picture understanding, so mermaid diagram would be welcomed.

With regard to the code itself, you can still factorize several parts, thus reducing how many lines you want to add.
Moreover, I would like to get an estimation on how many lines we will be able to remove once this PR would be merged.
Indeed, this is not the first time we refactor the code base and we did not remove all the code we should have removed for specific reasons.

Best regards.

pkg/operators/localmanager/wrappers.go

pkg/operators/formatters/formatters.go

pkg/runtime/local/local.go

pkg/operators/kubemanager/kubemanager.go

pkg/operators/kubemanager/wrappers.go

flyth · 2024-02-20T08:56:04Z

Just to be sure we're aligned, current status of this PR isn't working, right?

No, it should work. I seem to have messed up the local output with recent changes, sorry - I'll look into it. ig --daemon + gadgetctl or using on kubernetes should work.

mauriciovasquezbernal · 2024-02-20T20:17:55Z

pkg/operators/localmanager/localmanager.go

+	if !host {
+		if l.manager.igManager == nil {
+			return fmt.Errorf("container-collection isn't available")
+		}
+
+		// Create mount namespace map to filter by containers
+		mountnsmap, err := l.manager.igManager.CreateMountNsMap(id.String(), containerSelector)
+		if err != nil {
+			return commonutils.WrapInErrManagerCreateMountNsMap(err)
+		}
+
+		log.Debugf("set mountnsmap for gadget")
+		gadgetCtx.SetVar(gadgets.MntNsFilterMapName, mountnsmap)
+
+		l.mountnsmap = mountnsmap
+	} else if l.manager.igManager == nil {
+		log.Warn("container-collection isn't available: container enrichment and filtering won't work")
+	}


I'm wondering if this logic really belongs to this operator. The mount ns map is something that can only be used from eBPF programs, so wouldn't it make more sense to move this logic into the eBPF operator?

Then you would introduce a hard dependency on that from the eBPF operator - and we have two implementations (KubeManager + LocalManager). Also, we want to add the possibility to completely deactivate certain operators. I think having this contained in an operator helps keeping the code clean.

The mount ns map is something that can only be used from eBPF programs

It's probably going to be only eBPF accessing it in this specific case - but for other features, I think other operators might be interested in eBPF maps as well. Weird example: imagine an operator that exposes itself using a JSON/REST interface over HTTP that accepts data on a url like /api/:gadgetID/:mapName/set?key=abc&val=def where it would edit map contents that way.

Then you would introduce a hard dependency on that from the eBPF operator

What is the issue with this? The eBPF operator takes care of loading and attaching eBPF programs to the kernel, why can't it also take care of creating the filtering map that is used by those programs to discard events from the kernel?

I see that the {Local,Kube}Manager operators are (1) creating the ebpf map for filtering purposes (2) enriching data, do these two operations belong to the same operator? I know both features need the container collection, but other than that I'd say they're independent.

and we have two implementations (KubeManager + LocalManager).

Are those two operator really different? We're trying to close the gap between ig and ig-k8s (see #1403 & #2100), so probably those two operators are going to finish being a single one.

Also, we want to add the possibility to completely deactivate certain operators. I think having this contained in an operator helps keeping the code clean.

It's a good point. Two things:

Perhaps instead of disabling the whole operator it could be an option inside the ebpf operator?

I see a point on disabling some operators, but {Local,Kube}Manager are so important that I don't see a point in disabling them. Running the gadgets without providing any enrichment nor filtering capabilities?

It's probably going to be only eBPF accessing it in this specific case - but for other features, I think other operators might be interested in eBPF maps as well. Weird example: imagine an operator that exposes itself using a JSON/REST interface over HTTP that accepts data on a url like /api/:gadgetID/:mapName/set?key=abc&val=def where it would edit map contents that way.

How would this work? The ebpf operator puts all maps into the gadget context, and then this "ebpf-map-updater" uses this information to update them? Besides map, which other things should the ebpf operator expose and how will other operators understand this information is available on the context? In other words, what's the contract between different operators?

Then you would introduce a hard dependency on that from the eBPF operator

What is the issue with this? The eBPF operator takes care of loading and attaching eBPF programs to the kernel, why can't it also take care of creating the filtering map that is used by those programs to discard events from the kernel?

I see that the {Local,Kube}Manager operators are (1) creating the ebpf map for filtering purposes (2) enriching data, do these two operations belong to the same operator? I know both features need the container collection, but other than that I'd say they're independent.

Hmm, I think I see it this way: the eBPF operator (or other layer operators) provides the framework / tooling, while data operators provide the additional content / features. IMHO it's best to split that as much as possible. What advantages do you see other than removing this particular abstraction?

and we have two implementations (KubeManager + LocalManager).

Are those two operator really different? We're trying to close the gap between ig and ig-k8s (see #1403 & #2100), so probably those two operators are going to finish being a single one.

Yeah, it would be great if we could avoid duplicate code, but IMHO the combined operator should still be handled independently of the eBPF operator.

Also, we want to add the possibility to completely deactivate certain operators. I think having this contained in an operator helps keeping the code clean.

It's a good point. Two things:

Perhaps instead of disabling the whole operator it could be an option inside the ebpf operator?

I see a point on disabling some operators, but {Local,Kube}Manager are so important that I don't see a point in disabling them. Running the gadgets without providing any enrichment nor filtering capabilities?

If we can reduce direct dependencies, I think we should. This is especially the case for the Go API that we want to provide - if using IG as a lib, it would be better to import the least amount of code to achieve what you want. Importing for example the KubeManager brings a lot of dependencies on its own (basically the whole k8s ecosystem with tons of other packages) - if you only want to use IG as a tool to load eBPF programs and leverage the tooling / exporters etc, that might be overkill. I know the enrichments are probably our KSP right now, but that doesn't mean that the rest of IG isn't worth being used on its own 😉
I think the price of having things independent of each other after this refactoring is really low and that's why I'm all for going this way. 😄 But I'm happy to discuss further.

It's probably going to be only eBPF accessing it in this specific case - but for other features, I think other operators might be interested in eBPF maps as well. Weird example: imagine an operator that exposes itself using a JSON/REST interface over HTTP that accepts data on a url like /api/:gadgetID/:mapName/set?key=abc&val=def where it would edit map contents that way.

How would this work? The ebpf operator puts all maps into the gadget context, and then this "ebpf-map-updater" uses this information to update them? Besides map, which other things should the ebpf operator expose and how will other operators understand this information is available on the context? In other words, what's the contract between different operators?

OTTOMH, this could be achieved by exposing a map as a datasource through the eBPF Operator, tagging it using metadata with something like "exported" or "public" and then having the JSON/REST operator that would expose that datasource - this would mean we'd have to extend the DataSource to be able to fetch specific data, which we probably need sooner or later anyway. (A more hacky solution for a quick PoC could probably be done somehow using the Vars on the context)

eiffel-fl · 2024-02-21T06:51:02Z

Thank you for the mermaid graph, can you please explain why the following:

iterate over gadgetCtx().DataSources()if mntnsid found, add fields to that DataSourcesubscribe to that DataSource (unused though in GetGadgetInfo())gets accessors and caches them

is shared by both KubeManager and Formatter?
Same for this:

uses accessors to read and write data

Shouldn't we only have component which takes care of this rather than two?

flyth · 2024-02-21T11:09:27Z

Thank you for the mermaid graph, can you please explain why the following:

iterate over gadgetCtx().DataSources()if mntnsid found, add fields to that DataSourcesubscribe to that DataSource (unused though in GetGadgetInfo())gets accessors and caches them

is shared by both KubeManager and Formatter? Same for this:

uses accessors to read and write data

Shouldn't we only have component which takes care of this rather than two?

Both, KubeManager & Formatter are operator examples in the flow and they just do what operators are intended to do: look at what is there and decide whether they can be helpful by adding fields. So they're iterating over DataSources to see what's there and getting accessors for existing fields and fields they add - which later on are used to read from the existing fields and writing to their newly added fields. And as they're both (KubeManager & Formatter) adding different features in isolation, they both need to do that (this is where the initialization phase and getting the accessors is the "expensive" part while using the accessors later on when the data is flowing is pretty cheap).

I hope that answers the question.

mauriciovasquezbernal

Some more few comments. I jumping through the code trying to understand the different pieces.

pkg/datasource/router.go

pkg/operators/registry.go

pkg/datasource/datasource.go

pkg/datasource/data.go

pkg/operators/ebpf/struct.go

mauriciovasquezbernal · 2024-02-21T19:06:08Z

pkg/datasource/data.go

+func (f *field) ReflectType() reflect.Type {
+	switch Kind(f.Type) {
+	default:
+		return nil
+	case Int8:
+		return reflect.TypeOf(int8(0))
+	case Int16:
+		return reflect.TypeOf(int16(0))
+	case Int32:
+		return reflect.TypeOf(int32(0))
+	case Int64:
+		return reflect.TypeOf(int64(0))
+	case Uint8:
+		return reflect.TypeOf(uint8(0))
+	case Uint16:
+		return reflect.TypeOf(uint16(0))
+	case Uint32:
+		return reflect.TypeOf(uint32(0))
+	case Uint64:
+		return reflect.TypeOf(uint64(0))
+	case Float32:
+		return reflect.TypeOf(float32(0))
+	case Float64:
+		return reflect.TypeOf(float64(0))
+	case Bool:
+		return reflect.TypeOf(false)
+	case String:
+		return reflect.TypeOf("")
+	}
+}


Why Array and Slice aren't considered here? I tried to add them but ig exploded.

I think they (and String) need special handling because it's no simple conversion - if there's no reflect type, a field will be handled as binary blob / CString for now. A gadget author should be able to specify in more detail in the metadata file how the field should be encoded/decoded (so that JSON would have hex/base64 data instead of a string representation in some cases, for example). That's not a part of the PR right now - do you think it should be?

pkg/datasource/columns.go

pkg/datasource/accessors.go

pkg/runtime/grpc/oci.go

eiffel-fl

Hi!

Some comments with regard to some commits:

Best regards.

pkg/columns/columns.go

pkg/columns/columninfo.go

pkg/metadata/v1/metadata.go

pkg/datasource/accessors.go

pkg/datasource/columns.go

pkg/datasource/data.go

eiffel-fl · 2024-02-22T09:58:29Z

pkg/datasource/data.go

+	return true
+	ds.lock.RLock()
+	defer ds.lock.RUnlock()
+	return ds.requestedFields[fieldName]


Euh... I am just surprised the compiler does not catch this 😮.
So, please remove this return true.

Please, check this one!

This is still in progress, will resolve once the implementation is finalized.

This comment is still not addressed!

pkg/datasource/data.go

pkg/operators/oci/oci.go

eiffel-fl · 2024-02-22T10:19:35Z

Thank you for the mermaid graph, can you please explain why the following:

iterate over gadgetCtx().DataSources()if mntnsid found, add fields to that DataSourcesubscribe to that DataSource (unused though in GetGadgetInfo())gets accessors and caches them

is shared by both KubeManager and Formatter? Same for this:

uses accessors to read and write data

Shouldn't we only have component which takes care of this rather than two?

Both, KubeManager & Formatter are operator examples in the flow and they just do what operators are intended to do: look at what is there and decide whether they can be helpful by adding fields. So they're iterating over DataSources to see what's there and getting accessors for existing fields and fields they add - which later on are used to read from the existing fields and writing to their newly added fields. And as they're both (KubeManager & Formatter) adding different features in isolation, they both need to do that (this is where the initialization phase and getting the accessors is the "expensive" part while using the accessors later on when the data is flowing is pretty cheap).

I hope that answers the question.

This definitely makes sense! Thank you for the precision!

mauriciovasquezbernal · 2024-02-22T18:32:21Z

pkg/runtime/local/local.go

nit. Perhaps we can move these changes to a new file so it's easier in the future to remove the things we don't need. (same for other places where it could apply)

pkg/runtime/local/local.go

mauriciovasquezbernal · 2024-02-22T21:37:34Z

pkg/operators/oci/oci.go

+func (o *OciHandler) ParamDescs() params.ParamDescs {
+	return params.ParamDescs{
+		// Hardcoded for now
+		{
+			Key:          authfileParam,
+			Title:        "Auth file",
+			DefaultValue: oci.DefaultAuthFile,
+			TypeHint:     params.TypeString,
+		},
+		{
+			Key:          validateMetadataParam,
+			Title:        "Validate metadata",
+			Description:  "Validate the gadget metadata before running the gadget",
+			DefaultValue: "true",
+			TypeHint:     params.TypeBool,
+		},
+		{
+			Key:          insecureParam,
+			Title:        "Insecure connection",
+			Description:  "Allow connections to HTTP only registries",
+			DefaultValue: "false",
+			TypeHint:     params.TypeBool,
+		},
+		{
+			Key:          pullParam,
+			Title:        "Pull policy",
+			Description:  "Specify when the gadget image should be pulled",
+			DefaultValue: oci.PullImageMissing,
+			PossibleValues: []string{
+				oci.PullImageAlways,
+				oci.PullImageMissing,
+				oci.PullImageNever,
+			},
+			TypeHint: params.TypeString,
+		},
+		{
+			Key:         pullSecret,
+			Title:       "Pull secret",
+			Description: "Secret to use when pulling the gadget image",
+			TypeHint:    params.TypeString,
+		},
+	}
+}


I'm wondering how this will be handled. We can't rely on GetOCIGadgetInfo because these parameters are needed to pull the image, so I suppose somehow we'll have to hardcode them in the client so they're already available when calling GetOCIGadgetInfo.

Also, the commit message introducing this file says it's not an operator anymore, so how will be these parameters added to the CLI?

I gave it a try on 4f754cf, is this aligned with the ideas you had?

Adds support to serialize Data units from DataSources to JSON. Signed-off-by: Michael Friese <mfriese@microsoft.com>

Signed-off-by: Michael Friese <mfriese@microsoft.com>

This implements the new gadget lifecycle used by the image based gadgets. Signed-off-by: Michael Friese <mfriese@microsoft.com>

This adds a new generic handler for image based gadgets for all our cobra based binaries. Signed-off-by: Michael Friese <mfriese@microsoft.com>

This adds a generic operator that can have multiple callbacks installed without building a dedicated operator. This is useful to for example only extract data from the DataSources. Signed-off-by: Michael Friese <mfriese@microsoft.com>

Signed-off-by: Michael Friese <mfriese@microsoft.com>

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com> Signed-off-by: Michael Friese <mfriese@microsoft.com>

Signed-off-by: Michael Friese <mfriese@microsoft.com>

vfalico

LGTM

flyth · 2024-03-27T18:51:54Z

Thank you all so much for all the great and super in-depth reviews on all levels! I know this was again tough, but I'm so glad that we managed to get this through!

I'll compile a list of open issues and link them in a new issue so we can follow up.

flyth force-pushed the michael/refactoring-oci branch 3 times, most recently from 779539f to e95a194 Compare February 16, 2024 22:58

eiffel-fl reviewed Feb 19, 2024

View reviewed changes

alban reviewed Feb 19, 2024

View reviewed changes

flyth force-pushed the michael/refactoring-oci branch from e95a194 to e531c17 Compare February 19, 2024 12:39

alban reviewed Feb 19, 2024

View reviewed changes

pkg/gadget-service/api/api.proto Outdated Show resolved Hide resolved

mauriciovasquezbernal reviewed Feb 19, 2024

View reviewed changes

eiffel-fl reviewed Feb 20, 2024

View reviewed changes

flyth force-pushed the michael/refactoring-oci branch from e531c17 to a20d90b Compare February 20, 2024 11:17

This was referenced Feb 20, 2024

pkg/oci: add some helper functions for more control over image handling #2520

Merged

add btfhelpers package #2521

Merged

flyth force-pushed the michael/refactoring-oci branch from a20d90b to 5b8f8fe Compare February 20, 2024 20:11

mauriciovasquezbernal reviewed Feb 20, 2024

View reviewed changes

mauriciovasquezbernal reviewed Feb 21, 2024

View reviewed changes

pkg/datasource/accessors.go Show resolved Hide resolved

mauriciovasquezbernal reviewed Feb 21, 2024

View reviewed changes

pkg/runtime/grpc/oci.go Show resolved Hide resolved

eiffel-fl mentioned this pull request Feb 22, 2024

gadgets/run: Add support for toppers and implement top file #2132

Merged

4 tasks

eiffel-fl reviewed Feb 22, 2024

View reviewed changes

flyth force-pushed the michael/refactoring-oci branch 2 times, most recently from 32b0b7a to bc29d4f Compare February 22, 2024 12:50

mauriciovasquezbernal reviewed Feb 22, 2024

View reviewed changes

flyth and others added 15 commits March 27, 2024 19:33

datasource: add json formatter

11bf3b6

Adds support to serialize Data units from DataSources to JSON. Signed-off-by: Michael Friese <mfriese@microsoft.com>

add cli operator to output datasources to console

5971c0e

Signed-off-by: Michael Friese <mfriese@microsoft.com>

pkg/runtime/local: add support for image based gadget handling

f8cc324

This implements the new gadget lifecycle used by the image based gadgets. Signed-off-by: Michael Friese <mfriese@microsoft.com>

cmd/ig: add support for OCI gadgets

ac239e0

This adds a new generic handler for image based gadgets for all our cobra based binaries. Signed-off-by: Michael Friese <mfriese@microsoft.com>

pkg/gadget-service: update to support refactoring work for OCI gadgets

e689576

Signed-off-by: Michael Friese <mfriese@microsoft.com>

pkg/runtime/grpc: add support for OCI gadgets

bcf2c86

Signed-off-by: Michael Friese <mfriese@microsoft.com>

cmd/gadgetctl: add run command

acd3aac

Signed-off-by: Michael Friese <mfriese@microsoft.com>

cmd/kubectl-gadget: add run command

b43d24a

Signed-off-by: Michael Friese <mfriese@microsoft.com>

pkg/operators/kubemanager: add support for OCI gadgets as DataOperator

726eb5e

Signed-off-by: Michael Friese <mfriese@microsoft.com>

gadgettracermanager: import additional operators for OCI gadgets

cf46ff1

Signed-off-by: Michael Friese <mfriese@microsoft.com>

pkg/operators/socketenricher: implement DataOperator interface

ddfe6fd

Signed-off-by: Michael Friese <mfriese@microsoft.com>

ebpf: Add enum to string conversion

4ced0f7

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com> Signed-off-by: Michael Friese <mfriese@microsoft.com>

run tests: temporarily adjust and disable not fully compatible tests

9327ba7

Signed-off-by: Michael Friese <mfriese@microsoft.com>

adjust README.md and remove run gadget reference from pkg/all-gadgets.

259799a

Signed-off-by: Michael Friese <mfriese@microsoft.com>

flyth force-pushed the michael/refactoring-oci branch from 6c65fb0 to 259799a Compare March 27, 2024 18:33

vfalico approved these changes Mar 27, 2024

View reviewed changes

blanquicet approved these changes Mar 27, 2024

View reviewed changes

flyth merged commit 06c2fdb into main Mar 27, 2024
60 checks passed

flyth deleted the michael/refactoring-oci branch March 27, 2024 18:52

This was referenced Mar 27, 2024

Open comments / change requests / discussions regarding the refactoring done in #2512 #2653

Open

refactoring: port uprobe support #2657

Merged

eiffel-fl mentioned this pull request Apr 3, 2024

Wrong images are not rejected for run #2677

Closed

blanquicet mentioned this pull request Apr 8, 2024

gadgets: Implement mechanism for logging in eBPF programs #1882

Open

mauriciovasquezbernal mentioned this pull request Apr 16, 2024

image-based gadgets: Operators are always enabled for all gadgets #2167

Closed

mqasimsarfraz mentioned this pull request Apr 18, 2024

Image-based gadgets: It seems attribute from metadata file e.g hidden aren't respected anymore #2736

Open

mauriciovasquezbernal mentioned this pull request Apr 18, 2024

run: validate metadata #2739

Draft

blanquicet mentioned this pull request May 14, 2024

pkg/operators/ebpf: Avoid adding netns field to snapshot process #2838

Merged

blanquicet mentioned this pull request May 23, 2024

image-based gadgets: Implement support for toppers gadgets #1924

Open

Refactoring for image based gadgets #2512

Refactoring for image based gadgets #2512

Conversation

flyth commented Feb 16, 2024 • edited

Data flow

Todos

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alban Feb 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flyth commented Feb 19, 2024

mauriciovasquezbernal commented Feb 19, 2024

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flyth Feb 20, 2024 • edited

Choose a reason for hiding this comment

eiffel-fl left a comment

Choose a reason for hiding this comment

eiffel-fl left a comment

Choose a reason for hiding this comment

flyth commented Feb 20, 2024

Choose a reason for hiding this comment

flyth Feb 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flyth Feb 21, 2024 • edited

Choose a reason for hiding this comment

eiffel-fl commented Feb 21, 2024

flyth commented Feb 21, 2024

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eiffel-fl commented Feb 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vfalico left a comment

Choose a reason for hiding this comment

flyth commented Mar 27, 2024

flyth commented Feb 16, 2024 •

edited

alban Feb 19, 2024 •

edited

flyth Feb 20, 2024 •

edited

flyth Feb 20, 2024 •

edited

flyth Feb 21, 2024 •

edited