-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring for image based gadgets #2512
Conversation
779539f
to
e95a194
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I have some comments on the first commit.
I will take a deeper look to get a broader understanding later, nonetheless:
- a mermaid diagram would be welcomed to help understanding the links between the different structures.
- this PR adds 6000 lines of code, I understand protobuf generated code accounts for a big part of it but I really want a follow-up PR removing plenty of code.
- this PR has 24 commits, some of them add helpers which would have been already helpful and could have been merged before. I am wondering if you should not split it.
Best regards.
ds.fieldMap[f.Name] = (*field)(f) | ||
} | ||
ds.registerPool() | ||
return ds, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You never return an error, so drop this unused error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More tests could be added here that return errors when loading the untrusted "user" data here - I'd rather leave the error in to avoid refactoring the callers, but I'll add a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid refactoring the callers
gopatch
can handle this.
You are already refactoring, so let's merge correct code and only what we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid refactoring the callers
gopatch
can handle this. You are already refactoring, so let's merge correct code and only what we need.
I don't agree with this. Why to have to change this in the future if this is planned to return errors later on? It's better to define the right function signature from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was that I meant: you are already doing a refactoring here, so let's focus on refactor right now, instead of adding refactoring as future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unresolving as the error is still present but never used!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding we agreed to keep it this way to not have to change the function signature later on and have to edit all the callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
I think part of this PR can be added progressively with separate PRs, such as:
- changes to btfhelpers
- changes on CString
There are probably others... then, this PR will be a bit smaller.
Then, there are big blocks which could be split, such as:
- Changes to the service grpc API
- Introduction of the Layer operators
- datasource.FieldAccessor
It seems they can be implemented, reviewed and merged one after another in this order.
cmd/common/oci.go
Outdated
// Before running the gadget, we need to get the gadget info to perform | ||
// different tasks like creating the parser and setting flags for the | ||
// gadget's parameters. | ||
actualArgs := cmd.Flags().Args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but I think dynamic flags for gadget parameters are a mistake. I don't like that gadget parameters like --ignore-failed
are mixed up with operators/localmanager flags like --host
and with generic flags like --insecure
.
$ sudo -E ./ig run trace_exec -h
...
-h, --help help for run
--host Show data from both the host and containers
--ignore-failed Ignore failed events (default true)
--insecure Allow connections to HTTP only registries
I would prefer if it was --param ignore-failed=true
. It would make it a lot easier for the user to understand the category of the flag. And I suspect the implementation would be easier without the utils.ParseEarlyFlags
.
Also, I'd like to have an inspect command that would show, among other things, the parameters:
$ sudo -E ./ig image inspect trace_exec
In this way, the --help
command would never try to connect to internet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the --param
issue and also with having an inspect
call to show all available information (from the accumulated gadget metadata).
In this way, the --help command would never try to connect to internet.
Why would it change behavior here? (I mean: which requests would be saved here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it change behavior here? (I mean: which requests would be saved here?)
I don't understand the question.
What I meant is that when the user does not really understand how the CLI is supposed to be used and adds a --help
, it does not print any help message and tries to download something from the internet. I find that confusing for users.
$ sudo -E ./ig run trace exec --help
INFO[0000] Experimental features enabled
Error: getting gadget info: getting gadget image: pulling image "trace": downloading to local repository: failed to resolve ghcr.io/inspektor-gadget/gadget/trace:latest: ghcr.io/inspektor-gadget/gadget/trace:latest: not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay - so you mean the gadget doesn't need to be downloaded directly for that call with a --help
at the end. I don't think that's possible, though - because operators also have their params and those can depend on the actual gadget - e.g. if the gadget has mountnsid
, LocalManager or KubeManager needs to be loaded - and those also register params.
Or would you want those params also to move to the more generic solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or would you want those params also to move to the more generic solution?
Yes, I would like that. Something like --operator-opt foo:bar
, so the CLI flag --operator-opt
is known without downloading anything. And the user could discover the possible operator params with the image inspect
command.
Similarly, Docker has the flag --security-opt=
which is also not fully described in --help
because I think it is passed to some inner component:
--security-opt=seccomp:unconfined
--security-opt=no-new-privileges
--security-opt=label:disable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if we go that route, we could also use our internal conventions (e.g. --param operators.localmanager.foo=abc
). I'm certainly not against it, but it would make things a bit less straightforward than it was before for users.
Another thing I like about it, is that it can differentiate between local and remote operators; the cli
operator for example is always a local running one, while most of the others are run on the server side. So cli
could still register params directly, while remote params could be used through the detour. Hmmm.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but I think dynamic flags for gadget parameters are a mistake. I don't like that gadget parameters like
--ignore-failed
are mixed up with operators/localmanager flags like--host
and with generic flags like--insecure
.
I think this could be fixed by #935.
$ sudo -E ./ig run trace_exec -h ... -h, --help help for run --host Show data from both the host and containers --ignore-failed Ignore failed events (default true) --insecure Allow connections to HTTP only registries
I would prefer if it was
--param ignore-failed=true
. It would make it a lot easier for the user to understand the category of the flag.
What about parameters that are introduced by operators? (--host, --namespace, --pod, etc?) Should we use a --op-param
prefix too?
Also, I'd like to have an inspect command that would show, among other things, the parameters:
Do we really want to introduce another command for it? Isn't it better to have this under a well-known flag / or command like --help?
I was thinking we could have in the metadata data a help section that is shown when the user run ig run foo --help
, this will allow the gadget developer to write a documentation specific for their gadget.
What I meant is that when the user does not really understand how the CLI is supposed to be used and adds a --help, it does not print any help message and tries to download something from the internet. I find that confusing for users.
I'd say that specific case not well handled currently. We can improve this message by indicating the gadget isn't valid.
(I'd suggest to move this discussion to a different issue to help focusing the review on specific changes introduced by this PR)
e95a194
to
e531c17
Compare
You're both right - I will try to move all helpers to another PR.
Yeah, totally agree and this will follow soon.
This PR doesn't remove the run gadget - when we switch to this, the run gadget and some of its dependencies can be removed. |
When do you think we should remove the run gadget? IMO it should be done in this same PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very initial comments. I still need to check most of it.
Just to be sure we're aligned, current status of this PR isn't working, right?
I tried it and all what I got was:
$ $ sudo -E ig run-experimental trace_open
INFO[0000] Experimental features enabled
WARN[0000] unknown type for variable "gadget_filter_by_mntns": Var:"gadget_filter_by_mntns"
WARN[0000] unknown type for variable "gadget_filter_by_mntns": Var:"gadget_filter_by_mntns"
tags: nil, | ||
} | ||
|
||
i.vars[varName] = newVar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be i.maps
?
// }, | ||
{ | ||
prefixFunc: func(s string) (string, bool) { | ||
// Exceptions for backwards-compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backwards-compatibility with what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gadget_filter_by_mntns
is handled like a var
in the operator, which is preferred (in here) to work with the prefix gadget_var_
. In the end, it would be better to have the former be called gadget_var_filter_by_mntns
to fit the concept. I'm open for discussions around that 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Some comments, I am still reading it.
Best regards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Here are some other comments.
I will take a deeper look later and I will particularly focus on getting a big picture understanding, so mermaid diagram would be welcomed.
With regard to the code itself, you can still factorize several parts, thus reducing how many lines you want to add.
Moreover, I would like to get an estimation on how many lines we will be able to remove once this PR would be merged.
Indeed, this is not the first time we refactor the code base and we did not remove all the code we should have removed for specific reasons.
Best regards.
No, it should work. I seem to have messed up the local output with recent changes, sorry - I'll look into it. |
e531c17
to
a20d90b
Compare
a20d90b
to
5b8f8fe
Compare
if !host { | ||
if l.manager.igManager == nil { | ||
return fmt.Errorf("container-collection isn't available") | ||
} | ||
|
||
// Create mount namespace map to filter by containers | ||
mountnsmap, err := l.manager.igManager.CreateMountNsMap(id.String(), containerSelector) | ||
if err != nil { | ||
return commonutils.WrapInErrManagerCreateMountNsMap(err) | ||
} | ||
|
||
log.Debugf("set mountnsmap for gadget") | ||
gadgetCtx.SetVar(gadgets.MntNsFilterMapName, mountnsmap) | ||
|
||
l.mountnsmap = mountnsmap | ||
} else if l.manager.igManager == nil { | ||
log.Warn("container-collection isn't available: container enrichment and filtering won't work") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this logic really belongs to this operator. The mount ns map is something that can only be used from eBPF programs, so wouldn't it make more sense to move this logic into the eBPF operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you would introduce a hard dependency on that from the eBPF operator - and we have two implementations (KubeManager + LocalManager). Also, we want to add the possibility to completely deactivate certain operators. I think having this contained in an operator helps keeping the code clean.
The mount ns map is something that can only be used from eBPF programs
It's probably going to be only eBPF accessing it in this specific case - but for other features, I think other operators might be interested in eBPF maps as well. Weird example: imagine an operator that exposes itself using a JSON/REST interface over HTTP that accepts data on a url like /api/:gadgetID/:mapName/set?key=abc&val=def
where it would edit map contents that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you would introduce a hard dependency on that from the eBPF operator
What is the issue with this? The eBPF operator takes care of loading and attaching eBPF programs to the kernel, why can't it also take care of creating the filtering map that is used by those programs to discard events from the kernel?
I see that the {Local,Kube}Manager operators are (1) creating the ebpf map for filtering purposes (2) enriching data, do these two operations belong to the same operator? I know both features need the container collection, but other than that I'd say they're independent.
and we have two implementations (KubeManager + LocalManager).
Are those two operator really different? We're trying to close the gap between ig and ig-k8s (see #1403 & #2100), so probably those two operators are going to finish being a single one.
Also, we want to add the possibility to completely deactivate certain operators. I think having this contained in an operator helps keeping the code clean.
It's a good point. Two things:
- Perhaps instead of disabling the whole operator it could be an option inside the ebpf operator?
- I see a point on disabling some operators, but {Local,Kube}Manager are so important that I don't see a point in disabling them. Running the gadgets without providing any enrichment nor filtering capabilities?
It's probably going to be only eBPF accessing it in this specific case - but for other features, I think other operators might be interested in eBPF maps as well. Weird example: imagine an operator that exposes itself using a JSON/REST interface over HTTP that accepts data on a url like /api/:gadgetID/:mapName/set?key=abc&val=def where it would edit map contents that way.
How would this work? The ebpf operator puts all maps into the gadget context, and then this "ebpf-map-updater" uses this information to update them? Besides map, which other things should the ebpf operator expose and how will other operators understand this information is available on the context? In other words, what's the contract between different operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you would introduce a hard dependency on that from the eBPF operator
What is the issue with this? The eBPF operator takes care of loading and attaching eBPF programs to the kernel, why can't it also take care of creating the filtering map that is used by those programs to discard events from the kernel?
I see that the {Local,Kube}Manager operators are (1) creating the ebpf map for filtering purposes (2) enriching data, do these two operations belong to the same operator? I know both features need the container collection, but other than that I'd say they're independent.
Hmm, I think I see it this way: the eBPF operator (or other layer operators) provides the framework / tooling, while data operators provide the additional content / features. IMHO it's best to split that as much as possible. What advantages do you see other than removing this particular abstraction?
and we have two implementations (KubeManager + LocalManager).
Are those two operator really different? We're trying to close the gap between ig and ig-k8s (see #1403 & #2100), so probably those two operators are going to finish being a single one.
Yeah, it would be great if we could avoid duplicate code, but IMHO the combined operator should still be handled independently of the eBPF operator.
Also, we want to add the possibility to completely deactivate certain operators. I think having this contained in an operator helps keeping the code clean.
It's a good point. Two things:
- Perhaps instead of disabling the whole operator it could be an option inside the ebpf operator?
- I see a point on disabling some operators, but {Local,Kube}Manager are so important that I don't see a point in disabling them. Running the gadgets without providing any enrichment nor filtering capabilities?
If we can reduce direct dependencies, I think we should. This is especially the case for the Go API that we want to provide - if using IG as a lib, it would be better to import the least amount of code to achieve what you want. Importing for example the KubeManager brings a lot of dependencies on its own (basically the whole k8s ecosystem with tons of other packages) - if you only want to use IG as a tool to load eBPF programs and leverage the tooling / exporters etc, that might be overkill. I know the enrichments are probably our KSP right now, but that doesn't mean that the rest of IG isn't worth being used on its own 😉
I think the price of having things independent of each other after this refactoring is really low and that's why I'm all for going this way. 😄 But I'm happy to discuss further.
It's probably going to be only eBPF accessing it in this specific case - but for other features, I think other operators might be interested in eBPF maps as well. Weird example: imagine an operator that exposes itself using a JSON/REST interface over HTTP that accepts data on a url like /api/:gadgetID/:mapName/set?key=abc&val=def where it would edit map contents that way.
How would this work? The ebpf operator puts all maps into the gadget context, and then this "ebpf-map-updater" uses this information to update them? Besides map, which other things should the ebpf operator expose and how will other operators understand this information is available on the context? In other words, what's the contract between different operators?
OTTOMH, this could be achieved by exposing a map as a datasource through the eBPF Operator, tagging it using metadata with something like "exported" or "public" and then having the JSON/REST operator that would expose that datasource - this would mean we'd have to extend the DataSource to be able to fetch specific data, which we probably need sooner or later anyway. (A more hacky solution for a quick PoC could probably be done somehow using the Vars
on the context)
Thank you for the mermaid graph, can you please explain why the following:
is shared by both
Shouldn't we only have component which takes care of this rather than two? |
Both, KubeManager & Formatter are operator examples in the flow and they just do what operators are intended to do: look at what is there and decide whether they can be helpful by adding fields. So they're iterating over DataSources to see what's there and getting accessors for existing fields and fields they add - which later on are used to read from the existing fields and writing to their newly added fields. And as they're both (KubeManager & Formatter) adding different features in isolation, they both need to do that (this is where the initialization phase and getting the accessors is the "expensive" part while using the accessors later on when the data is flowing is pretty cheap). I hope that answers the question. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more few comments. I jumping through the code trying to understand the different pieces.
pkg/datasource/data.go
Outdated
func (f *field) ReflectType() reflect.Type { | ||
switch Kind(f.Type) { | ||
default: | ||
return nil | ||
case Int8: | ||
return reflect.TypeOf(int8(0)) | ||
case Int16: | ||
return reflect.TypeOf(int16(0)) | ||
case Int32: | ||
return reflect.TypeOf(int32(0)) | ||
case Int64: | ||
return reflect.TypeOf(int64(0)) | ||
case Uint8: | ||
return reflect.TypeOf(uint8(0)) | ||
case Uint16: | ||
return reflect.TypeOf(uint16(0)) | ||
case Uint32: | ||
return reflect.TypeOf(uint32(0)) | ||
case Uint64: | ||
return reflect.TypeOf(uint64(0)) | ||
case Float32: | ||
return reflect.TypeOf(float32(0)) | ||
case Float64: | ||
return reflect.TypeOf(float64(0)) | ||
case Bool: | ||
return reflect.TypeOf(false) | ||
case String: | ||
return reflect.TypeOf("") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Array and Slice aren't considered here? I tried to add them but ig exploded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they (and String) need special handling because it's no simple conversion - if there's no reflect type, a field will be handled as binary blob / CString for now. A gadget author should be able to specify in more detail in the metadata file how the field should be encoded/decoded (so that JSON would have hex/base64 data instead of a string representation in some cases, for example). That's not a part of the PR right now - do you think it should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Some comments with regard to some commits:
Best regards.
return true | ||
ds.lock.RLock() | ||
defer ds.lock.RUnlock() | ||
return ds.requestedFields[fieldName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Euh... I am just surprised the compiler does not catch this 😮.
So, please remove this return true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, check this one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still in progress, will resolve once the implementation is finalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is still not addressed!
This definitely makes sense! Thank you for the precision! |
32b0b7a
to
bc29d4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Perhaps we can move these changes to a new file so it's easier in the future to remove the things we don't need. (same for other places where it could apply)
pkg/operators/oci/oci.go
Outdated
func (o *OciHandler) ParamDescs() params.ParamDescs { | ||
return params.ParamDescs{ | ||
// Hardcoded for now | ||
{ | ||
Key: authfileParam, | ||
Title: "Auth file", | ||
DefaultValue: oci.DefaultAuthFile, | ||
TypeHint: params.TypeString, | ||
}, | ||
{ | ||
Key: validateMetadataParam, | ||
Title: "Validate metadata", | ||
Description: "Validate the gadget metadata before running the gadget", | ||
DefaultValue: "true", | ||
TypeHint: params.TypeBool, | ||
}, | ||
{ | ||
Key: insecureParam, | ||
Title: "Insecure connection", | ||
Description: "Allow connections to HTTP only registries", | ||
DefaultValue: "false", | ||
TypeHint: params.TypeBool, | ||
}, | ||
{ | ||
Key: pullParam, | ||
Title: "Pull policy", | ||
Description: "Specify when the gadget image should be pulled", | ||
DefaultValue: oci.PullImageMissing, | ||
PossibleValues: []string{ | ||
oci.PullImageAlways, | ||
oci.PullImageMissing, | ||
oci.PullImageNever, | ||
}, | ||
TypeHint: params.TypeString, | ||
}, | ||
{ | ||
Key: pullSecret, | ||
Title: "Pull secret", | ||
Description: "Secret to use when pulling the gadget image", | ||
TypeHint: params.TypeString, | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how this will be handled. We can't rely on GetOCIGadgetInfo
because these parameters are needed to pull the image, so I suppose somehow we'll have to hardcode them in the client so they're already available when calling GetOCIGadgetInfo
.
Also, the commit message introducing this file says it's not an operator anymore, so how will be these parameters added to the CLI?
I gave it a try on 4f754cf, is this aligned with the ideas you had?
Adds support to serialize Data units from DataSources to JSON. Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
This implements the new gadget lifecycle used by the image based gadgets. Signed-off-by: Michael Friese <mfriese@microsoft.com>
This adds a new generic handler for image based gadgets for all our cobra based binaries. Signed-off-by: Michael Friese <mfriese@microsoft.com>
This adds a generic operator that can have multiple callbacks installed without building a dedicated operator. This is useful to for example only extract data from the DataSources. Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com> Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
Signed-off-by: Michael Friese <mfriese@microsoft.com>
6c65fb0
to
259799a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you all so much for all the great and super in-depth reviews on all levels! I know this was again tough, but I'm so glad that we managed to get this through! I'll compile a list of open issues and link them in a new issue so we can follow up. |
In our teams' efforts to quickly move forward with the "image based gadgets" feature/architecture (which is really a great achievement!), many things have been built into an interface that was previously used for all other single gadgets - basically all of the functionality has landed in a gadget simply called "run". Yet, plans for the image based gadgets have outgrown the capabilities of that interface: in the future, those gadget images should for example be able to extract multiple event streams at the same time (which previously would have required two separate gadget instances) that then can be consumed in different ways.
Also, while we previously knew which gadget expected which kind of enrichment, it was easy to just actively extend our Go data structures and wire the gadgets up in a way that they could get the information needed. We use operators for that, which check for certain interface implementations on the gadgets itself and then activate themselves.
With the image based gadgets, however, we don't have those clear Go types anymore - instead, we try to mostly infer data layouts from BTF and then we manually leverage operators depending on the outcome of the inference and simply reserve space for all the information that could be added. This has added quite a lot of direct dependencies to the run gadget that are not necessarily required all the time.
This refactoring addresses a couple of things:
This is still a draft and still needs a bit of polishing, especially around documentation & testing.
I will add more information and the open todo items over the next couple of days.
Data flow
Todos
Changes to the run gadget implemented while this PR is/was open and should be addressed here or in a follow-up PR. Feel free to add.
top file
#2132