Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached K8S CSI driver #89

Merged
merged 2 commits into from
May 8, 2024
Merged

cached K8S CSI driver #89

merged 2 commits into from
May 8, 2024

Conversation

airhorns
Copy link
Contributor

@airhorns airhorns commented May 5, 2024

This makes cached able to operate in a new mode -- as a Kubernetes Container Storage Interface driver, where it can be the thing that powers K8S volumes directly! CSIs are usually used to implement strange and foreign storage interfaces, like GCP persistent disks or NFS or whatever weird mounts you can figure out in linux.

In order for the hardlinks to be on the same device and thus actually improve performance, the cache daemon has to have privileged access to the the volumes that pods mount. We can do that with ninja editing stuff in /var/lib/kubernetes in a privileged container, or we can do it the safer way that k8s endorses! It actually still ends up modifying stuff in /var/lib/kubernetes and has the same escalated privileges, but, a big win is that the CSI daemon needs no network interface, and only actually communicates with other trusted components (the kubelet). In our case, we don't need to implement very much of the CSI interface (and we report as such with the fancy capabilities RPCs), we just want an emptyDir on the same device as wherever the cache is stored. Turns out, this is pretty easy!

How this will work:

  • we'll deploy cached as a daemonset throughout the cluster, running in CSI mode talking to the kubelet
  • pods that want to use the DL cache will have volume declarations that look like this (instead of emptyDir):
      volumes:
       - name: gadget
         csi:
           driver: com.gadget.dateilager.cached
           volumeAttributes:
             placeCacheAtPath: "cache" # matches
           readOnly: false
  • when a new one of these volumes shows up, the kubelet will make GRPC calls to the CSI socket, which this PR makes cached implement
  • cached won't actually mount anything, but instead just drop the cache in an empty folder and set the permissions up right (the kubelet actually mounts the folder into the docker container as it always does) using the existing implementation from the previous PR
  • when the pod terminates, cached gets another RPC, and removes the whole folder (but not the shared golden copy of the cache).

Woop woop!

@airhorns airhorns force-pushed the cached branch 2 times, most recently from 7651505 to b48e6ea Compare May 5, 2024 19:45
@airhorns airhorns force-pushed the cached branch 2 times, most recently from 6b4d7cc to 7ee43f7 Compare May 5, 2024 22:32
@airhorns airhorns changed the title Implement a CSI for kubernetes that prepopulates a mounted emptyDir with cached K8S CSI driver May 5, 2024
@airhorns airhorns force-pushed the harry/cached-csi branch 4 times, most recently from ca00e2b to 332fe4e Compare May 6, 2024 01:58
@@ -0,0 +1,203 @@
package api
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these methods on Cached exist in another file?

Copy link
Contributor Author

@airhorns airhorns May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it was nice to separate them since its the CSI stuff vs our stuff, but if that's against convention I will merge them?

pkg/api/cachedcsi.go Outdated Show resolved Hide resolved
pkg/api/cachedcsi.go Outdated Show resolved Hide resolved
pkg/api/cachedcsi.go Outdated Show resolved Hide resolved
pkg/api/cachedcsi.go Outdated Show resolved Hide resolved
if csiSocket != "" {
go (func() {
logger.Info(ctx, "start CSI server")
err := s.ServeCSI(ctx, csiSocket)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will cleanly stop on graceful shutdown. You likely want a cancellable context passed in here that we can abort on <-osSignals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind terribly shooting me a diff? I feel like I could figure it out but it would take you approximately 10 seconds to do


select {
case err := <-errors:
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CSI server caused this to fail, you need to also stop the GRPC server.

@angelini
Copy link
Contributor

angelini commented May 8, 2024

FYI, I'm going to push commits to this branch

@angelini angelini marked this pull request as ready for review May 8, 2024 17:07
@angelini angelini merged commit 942e67d into main May 8, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants