Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle seccomp agent upgrades without disruption to running containers #8

Open
rata opened this issue Mar 12, 2021 · 2 comments
Open
Labels
kind/enhancement New feature or request

Comments

@rata
Copy link
Member

rata commented Mar 12, 2021

Ideal future situation
Roll a new version of the agent without impact on any running container on the nodes.

Implementation options
Some random ideas that come to mind:

  • Configure the daemonset to create the new pod before destroying the old one. The new pod can communicate and receive the fds from the old. This of course doesn't help with crashes, though.
  • Have a small binary that holds the fds and pass them to the agent when it starts. This will handle just fine if the agent crashes or needs to be restarted. This binary should probably be another daemonset. Need to decide how the flow would look like (this binary receives from the socket and passes fds to the agent? So it needs to communicate with the agent not only on startup? Can the fds be received by both, the agent and the small binary, so the agent only queries the small binary on startup?)
  • Investigate if CRIU has any interesting idea we can use here: https://criu.org
  • Other resources that might be useful
@rata rata added the kind/enhancement New feature or request label Mar 12, 2021
@alban
Copy link
Member

alban commented Mar 12, 2021

We could get some inspiration from systemd and its "FD Store" facility that stores file descriptors from services when they restart (systemctl restart). See FDSTORE=1 in

We could have this second daemonset you mention (seccomp-fdstorage) to store the fds along with the related metadata.

But is the additional complexity worth it?

@rata
Copy link
Member Author

rata commented Mar 12, 2021

Oooor, just rely on the host systemd to save the fds for us, using that functionality. No "inspiration", just use it!

We should explore more that option (like security concerns, etc.) but seems worth exploring. Also, Kubernetes graceful shutdown KEP works only with systemd hosts, so most hosts really should have systemd. Maybe we can't use it if we want to run on GKE Autopilot, but all at its own time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants