Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting remote plugin for tasks etc. #5241

Open
nyanpassu opened this issue Mar 22, 2021 · 10 comments
Open

Supporting remote plugin for tasks etc. #5241

nyanpassu opened this issue Mar 22, 2021 · 10 comments

Comments

@nyanpassu
Copy link

nyanpassu commented Mar 22, 2021

[Problem encoutered]
My teammate and I are implementing an container runtime which will start container by systemd. In order to make the container runtime cooperate with containerd(so that we can use our runtime in container orchestration platform) we do implemented our own containerd-shim binary, but we find out when containerd lost connection with shim(also when system rebooted) it will remove resources under the container which is not we wanted. So we need to implementing a task plugin for our runtime too, and I'm approaching the target by binary plugin for now, as the remote plugin only support content and snapshot. However building the binary plugin encountered binary compatibility issuses(same as golang/go#31354), I solved the problem for now but I think maybe it is better to support each type of plugins via remote plugin.

[Solution Suggested]
Adding remote plugin support for tasks and other type of plugin.

@fuweid
Copy link
Member

fuweid commented Mar 23, 2021

we find out when containerd lost connection with shim(also when system rebooted) it will remove resources under the container which is not we wanted.

@nyanpassu Would you mind to share more details? containerd doesn't restart the container actually. So I want to see more detail here. Thanks

@nyanpassu
Copy link
Author

we find out when containerd lost connection with shim(also when system rebooted) it will remove resources under the container which is not we wanted.

@nyanpassu Would you mind to share more details? containerd doesn't restart the container actually. So I want to see more detail here. Thanks

Ok, our work is to implementing an low level OCI runtime(like runc), but it will start the container by systemd(also there will be a systemd unit file created) when OCI start is called. We want the runtime to cooperate with containerd(like kata-containers) so we can reuse docker and containerd as the endpoint for our container orchestration platform(for k8s it is containerd cri). A shim runtime is created too to integrate with containerd.

During my development I noticed that when I stop containerd or restart it after force kill it(also will a system reset), the bundle/rootfs is removed, it will affect the systemd unit. I figured out it's done by a task manage service, the task manage service is in the form of task plugin, and when a task is created, containerd will find a task manage service by container runtime id(if not found then default task service is used). The cause for bundle/rootfs is removed is that the task manage service failed to connect with shim process(there might be some bugs in my shim runtime implementation), so it will regard the shim is dead and clear its resources.

The recycling of container resources is not what we want, we hope to stop the containerd task only when an explicit stop/kill/delete is perform on it.If the bundle is removed the systemd unit will fail. So I'm trying to implement a task manage service for our runtime to avoid task failing. Since remote plugin is not supported for task, I'm approaching by binary plugin. However there are some problem with golang plugin: plugin.Loading will fail when built with go mod on; different version of package will be reported when built by different gopath, and internal package will conflict when built with -trimpath flag; vendor code will compile multiple times. Generally golang plugin is not reliable in practise. I'm forking containerd to add support for remote task plugin, and I hope this will be supported by containerd officially.

Regards!

@mikebrow
Copy link
Member

the problem: "...failed to connect with shim" on restart of containerd
observation: container bundle is recycled because of the problem ^

desire: "The recycling of container resources is not what we want" even if you can't re-establish a shim to manage the container?

solution: "fork containerd to add support for a remote task plugin" and make it a binary plugin "because golang plugins are not reliable in practice?"

Why not just fix the problem?

@nyanpassu
Copy link
Author

nyanpassu commented Mar 24, 2021

the problem: "...failed to connect with shim" on restart of containerd
observation: container bundle is recycled because of the problem ^

desire: "The recycling of container resources is not what we want" even if you can't re-establish a shim to manage the container?

solution: "fork containerd to add support for a remote task plugin" and make it a binary plugin "because golang plugins are not reliable in practice?"

Why not just fix the problem?

Because re-establish connection to shim will certainly fail on system reset/reboot, there is no shim process running and we will try to restart shim in our task service implementation(faking that task keep on running after system reset/reboot). Starting shim by systemd is weird and there is no guarantee that shim process will have started before startup of containerd and init of task service.

@fuweid
Copy link
Member

fuweid commented Mar 27, 2021

@nyanpassu sorry for late reply. I try to understand this feature request.

During my development I noticed that when I stop containerd or restart it after force kill it(also will a system reset), the bundle/rootfs is removed, it will affect the systemd unit.

I try to explain how the containerd works for container at first. :)

When we use containerd to manage lifecycle of container, we need to configure how to run container at first., like what container's rootFS is, entrypoint, CPU resource and so on. The configuration will be stored in metadata plugin which managed by container service. But we haven't running process(container) yet after call container service.

We need call task service to start shim with configured container and then the shim will handle lifecycle of container, like create/start/kill/exec. containerd communicates with shim by ttrpc protocol on socket. We have running container after called task service's Start API.

When containerd restarts, the containerd will retry to reconnect to the existing shim. The socket address is stored in the bundle path (for example, /var/run/containerd/io.containerd.runtime.v2.task/${NAMESPACE}/${CONTAINER_ID}/address). Before reconnect to the shim, the containerd needs to ensure the container ID available in the container service. If not, the containerd will delete the bundle. And if the container ID is available but shim is down, the containerd also remove the bundle.

The container bundle is ephemeral but the configuration in container service is persistent. And the lifecycle of shim is align with container. The bundle is stored in /var/run/. Basically, the /var/run is tmpFS which will be empty after host(system) reboots.

So back to your case. If I understand correct, it looks like you put the systemd unit file in the bundle.

If the shim is down, the reconnection will be failed after containerd restart. In current design of containerd, it is reasonable to remove the bundle because container process is ephemeral.And containerd also provides shim.Delete binary API to allow runtime author to handle resource cleanup. I think you can handle the systemd unit issue in the hook.

If the host restart, since the bundles are stored in tmpFS, the content will be lost. I think you can put the system unit in persistent folder, like /var/lib.

Starting shim by systemd is weird and there is no guarantee that shim process will have started before startup of containerd and init of task service.

containerd provides restart plugin to handle restart container by some condition.

Hope the information can help!

@nyanpassu
Copy link
Author

nyanpassu commented Mar 29, 2021

@nyanpassu sorry for late reply. I try to understand this feature request.

During my development I noticed that when I stop containerd or restart it after force kill it(also will a system reset), the bundle/rootfs is removed, it will affect the systemd unit.

I try to explain how the containerd works for container at first. :)

When we use containerd to manage lifecycle of container, we need to configure how to run container at first., like what container's rootFS is, entrypoint, CPU resource and so on. The configuration will be stored in metadata plugin which managed by container service. But we haven't running process(container) yet after call container service.

We need call task service to start shim with configured container and then the shim will handle lifecycle of container, like create/start/kill/exec. containerd communicates with shim by ttrpc protocol on socket. We have running container after called task service's Start API.

When containerd restarts, the containerd will retry to reconnect to the existing shim. The socket address is stored in the bundle path (for example, /var/run/containerd/io.containerd.runtime.v2.task/${NAMESPACE}/${CONTAINER_ID}/address). Before reconnect to the shim, the containerd needs to ensure the container ID available in the container service. If not, the containerd will delete the bundle. And if the container ID is available but shim is down, the containerd also remove the bundle.

The container bundle is ephemeral but the configuration in container service is persistent. And the lifecycle of shim is align with container. The bundle is stored in /var/run/. Basically, the /var/run is tmpFS which will be empty after host(system) reboots.

So back to your case. If I understand correct, it looks like you put the systemd unit file in the bundle.

If the shim is down, the reconnection will be failed after containerd restart. In current design of containerd, it is reasonable to remove the bundle because container process is ephemeral.And containerd also provides shim.Delete binary API to allow runtime author to handle resource cleanup. I think you can handle the systemd unit issue in the hook.

If the host restart, since the bundles are stored in tmpFS, the content will be lost. I think you can put the system unit in persistent folder, like /var/lib.

Starting shim by systemd is weird and there is no guarantee that shim process will have started before startup of containerd and init of task service.

containerd provides restart plugin to handle restart container by some condition.

Hope the information can help!

@fuweid fu Thank you for your reply.

I've go through the code and understand how containerd works, and in our case we need to make the containerd task persistent as the real task process is managed by systemd. As long as the systemd unit exists the task exists(we will return paused status when the unit is failed). We can store the bundle file in persistent folder but still we need to have unit and task both started and running after system rebooted, and that's against v2 runtime task manager.

We can just have a persistent task manager by have a runtime plugin for our runtime, however this is not practicable due to the golang plugin binary compatibility issue. As long as the golang issue is not fixed, the only easy way is to have a remote task plugin, just like content and snapshot. The restart plugin seems can't be adopt in our case for the lifecycle is align with task, not with container, align with container will be another plan to make.

I've go throught the code of restart plugin and I've got some questions. As the plugin is a part of containerd, why it use remote grpc service to communicate with container/task service? Is that because it can't get a reference in plugin context to get the local container/task service?

Regards!

@fuweid
Copy link
Member

fuweid commented Mar 29, 2021

hi @nyanpassu

in our case we need to make the containerd task persistent as the real task process is managed by systemd. As long as the systemd unit exists the task exists(we will return paused status when the unit is failed).

In your design, if the process dies, you will restart the task instead of recreate shim?

We can just have a persistent task manager by have a runtime plugin for our runtime, however this is not practicable due to the golang plugin binary compatibility issue.

containerd allows to build your own binary, like https://github.com/AkihiroSuda/containerd-example-custom-daemon. You can build your own task plugin with containerd core vendor. But I am still trying to get your design into current containerd framework which make integration easier. :)

As the plugin is a part of containerd, why it use remote grpc service to communicate with container/task service? Is that because it can't get a reference in plugin context to get the local container/task service?

In containerd, service-type plugin is internal one and grpc-type plugn is to export API as endpoint. The restart plugin is using service-type plugin instead of grpc.

// https://github.com/containerd/containerd/blob/master/runtime/restart/monitor/monitor.go#L66func init() {
	plugin.Register(&plugin.Registration{
		Type: plugin.InternalPlugin,
		Requires: []plugin.Type{
			plugin.ServicePlugin,
		},
		ID: "restart",
		Config: &Config{
			Interval: duration{
				Duration: 10 * time.Second,
			},
		},
		InitFn: func(ic *plugin.InitContext) (interface{}, error) {
			opts, err := getServicesOpts(ic)
			....
		},
	})
}

The getServicesOpts is the helper to get interface from service-type interface. Back to your design, you can build your task plugin with the helper :)

@nyanpassu
Copy link
Author

nyanpassu commented Mar 29, 2021

Hi @fuweid .

In our design there is a shim runtime and oci runtime of our own, and our oci.create will create a systemd unit, oci.start will start it(systemctl start). The real process of container is the child of systemd process, but not child of shim process, the shim is just a dummy. The container will be restart by systemd when it encounter error, if it is failed, our oci.state will report its state as paused(or other custom state as the oci allow custom state). So we just plan to keep the task in a running status as long as we don't delete it, that means no to delete the bundle and rebuild task in memory after system rebooted.

Build plugin in vendor seems impossible for the moment(I'm working with go 1.16), golang treat mod as different by their abs path when compile(I think golang caculate the mod hash by abs path). The only way to compile compatible main binary and plugin binary is put them under same GOPATH and disable go mod when compile. If compile with trimpath flag, will cause sub module of golang core to conflict. I've tried to working on binary plugin, but it's not practicable in product env, and compile under GOPATH is nasty for CIDI system.

Building our own containerd executable is a good point, I'll talk with my teammate. We can put containerd core in vendor and build an executable with our plugin, however I'm afraid that's not accepted by our SRE.

I thought containerd plugin will adopt remote plugin for every type of them, because remote plugin runs in another process and will have the best compatibility. And we don't need to replace our containerd binary.

Thank you and best regards.

@fuweid
Copy link
Member

fuweid commented Mar 30, 2021

Hi @nyanpassu

I thought containerd plugin will adopt remote plugin for every type of them, because remote plugin runs in another process and will have the best compatibility. And we don't need to replace our containerd binary.

ShimV2 is one of external plugins mentioned in https://github.com/containerd/containerd/blob/master/docs/PLUGINS.md. Currently, the task service is built-in plugin and runtime author can focus on the shimV2 API implementation. Though it isn't easy to adopt with your design, I hope you can bring more input and see how to fix this problem :).

Building our own containerd executable is a good point, I'll talk with my teammate. We can put containerd core in vendor and build an executable with our plugin, however I'm afraid that's not accepted by our SRE.

The containerd is not just binary or daemon. It also provides tools(smart client, plugins) to help developers to make their own binary. It is reasonable. :p


Looking forward to your more input on this thread! Thanks

@nyanpassu
Copy link
Author

nyanpassu commented Mar 30, 2021

@fuweid

Yes the task service is built-in plugin, however I see that it will try to find a specific runtime platform for each container's shim runtime, and a remote runtime is reasonable to me(althrough there are some difficulties with method signature of current runtime platform). The best way of course is to fix golang plugin problems so as to allow binary plugin loaded dynamically, but I don't see anything actions on it by golang team.

Yes, containerd is a toolset other then a binary or daemon, however we hope not to build our own containerd binary, but extends its functionality with additional plugins. Having our own binary will cost more manpower to maintain it. I've discussed with my teammate and leader, they decide to use remote plugin(I'm working on a fork) and maybe in the future switch to binary plugin once golang fixed the compatibility issue.

Regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants