Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance exec operation to survive CRI-O restarts #7826

Open
hasan4791 opened this issue Feb 28, 2024 · 10 comments
Open

Enhance exec operation to survive CRI-O restarts #7826

hasan4791 opened this issue Feb 28, 2024 · 10 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@hasan4791
Copy link
Contributor

hasan4791 commented Feb 28, 2024

What happened?

Currently cri-o holds the streaming server which manages the exec connection to/from the running containers. This is causing the exec to break when cri-o is restarted. If possible, we can offload exec operation to intermediate application like conmon or something similar, it would ideally survive the cri-o restarts. Open to discussion on this topic.

What did you expect to happen?

NA

How can we reproduce it (as minimally and precisely as possible)?

NA

Anything else we need to know?

NA

CRI-O and Kubernetes version

NA

OS version

NA

Additional environment details (AWS, VirtualBox, physical, etc.)

NA

@hasan4791 hasan4791 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 28, 2024
@saschagrunert saschagrunert added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 28, 2024
@saschagrunert
Copy link
Member

This is a feature we can elaborate on, maybe it makes sense to keep the connection open in conmon-rs.

@kwilczynski
Copy link
Member

/retitle Enhance exec operation to survive CRI-O restarts

@openshift-ci openshift-ci bot changed the title [FEAT] Enhance exec operation to survive cri-o restarts Enhance exec operation to survive CRI-O restarts Feb 28, 2024
@kwilczynski
Copy link
Member

@hasan4791, do you have some things that require to be run for longer?

Why would a momentary interruption be a problem? While CRI-O restarts, which should be momentary, hopefully. What would be the use case you have in mind?

I am asking as there is also a push to enable and ensure a timeout for any streaming operations from the container back to the user.

@haircommander
Copy link
Member

I think this is an all or nothing thing. If CRI-O restarts, it looses the connection to the kubelet which will tear down the endpoint assuming the container stopped . I think delegating to conmon-rs is a good idea

@hasan4791
Copy link
Contributor Author

hasan4791 commented Feb 29, 2024

@kwilczynski Not in my case, but i was fiddling with registry config in the cluster nodes and had to restart the runtime which disconnected the session. I thought it would be a good to have thing to keep the session open unless user disconnects or times out on ideal.

@hasan4791
Copy link
Contributor Author

@haircommander Is it possible to extend the same for log streaming as well?

@haircommander
Copy link
Member

Yeah the mechanism could be similar

@saschagrunert
Copy link
Member

saschagrunert commented Mar 21, 2024

So, exec and attach are pretty much the same, portforward is slightly different but should work as well. Logs is tricky, because the log reader in the kubelet (used by crictl as well) is doing a container status to ensure that the container is running.

See: https://github.com/kubernetes/kubernetes/blob/9c50b25/pkg/kubelet/kuberuntime/logs/logs.go#L418-L444

If the runtime is not available, then it assumes that the container is not running as well (which is wrong in case of CRI-O) and therefore aborts the log reading.

@saschagrunert
Copy link
Member

The solution for the log streaming is available in kubernetes/kubernetes#124025

@kwilczynski
Copy link
Member

/assign saschagrunert
/assign haircommander
/assign kwilczynski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: To do
Development

No branches or pull requests

4 participants