Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow PVC Model Mount in ReadWrite Mode #3687

Open
supertetelman opened this issue May 14, 2024 · 12 comments
Open

Allow PVC Model Mount in ReadWrite Mode #3687

supertetelman opened this issue May 14, 2024 · 12 comments

Comments

@supertetelman
Copy link

/kind feature

I would like to add an additional flag to specify a volume to be ReadWrite or ReadOnly.

Currently the implementation for PVC model stores here expects that a PV containing model files is always mounted to a ServingRuntime as ReadOnly. I have a use case where my server has some internal logic that I would like to dynamically download model components and cache them to the local PV.

As an alternative, this could be a use case for the new model-car, but my download logic is somewhat complex and I need to run the logic in many locations outside of KServe, so keeping the logic inside the container works best for me. This may also be a use case for a custom StorageContainer, but I think the change is small enough and common enough that other users might find this feature useful.

Design proposal:
Add a readonly flag under storageUri:

storageUri: "pvc://pvc-name/model-store/" 
  readonly: true # default true

See Slack for more details: https://cloud-native.slack.com/archives/C06AH2C3K8B/p1714883766854429

@lizzzcai
Copy link
Member

do you think adding a configuration readOnly here will help here instead of adding a new parameter in the spec? so you can overwrite the value to False.

@lizzzcai
Copy link
Member

In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think? @yuzisun

@spolti
Copy link
Contributor

spolti commented May 14, 2024

do you think adding a configuration readOnly here will help here instead of adding a new parameter in the spec? so you can overwrite the value to False.

this option sounds like you can provide your own PVC, in theory you can configure it as RWX, right? Or am I missing something?

In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think? @yuzisun

Isn't it the purpose of the model-car as well?

@terrytangyuan
Copy link
Member

In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think?

I think this would be very useful to have. We recently just had OOM issue because of this.

@yuzisun
Copy link
Member

yuzisun commented May 14, 2024

In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think?

I think this would be very useful to have. We recently just had OOM issue because of this.

How did you get OOM ?

@terrytangyuan
Copy link
Member

How did you get OOM ?

When we start having more replicas, it creates disk pressure on the node and also the pods are killed due to OOM killed as finite amount of RAM is allocated to emptyDir volume.

@yuzisun
Copy link
Member

yuzisun commented May 15, 2024

How did you get OOM ?

When we start having more replicas, it creates disk pressure on the node and also the pods are killed due to OOM killed as finite amount of RAM is allocated to emptyDir volume.

EmptyDir should by default use ephemeral storage so if you do not set the resource limit then it can potentially create the disk pressure issue.

@yuzisun
Copy link
Member

yuzisun commented May 15, 2024

In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think? @yuzisun

Interesting idea, sounds like you want to do lazy model caching on local PVC, if the model does not exist on the local PVC then you download from the cloud storage otherwise you directly load from PVC.

@supertetelman
Copy link
Author

supertetelman commented May 15, 2024

In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think? @yuzisun

Interesting idea, sounds like you want to do lazy model caching on local PVC, if the model does not exist on the local PVC then you download from the cloud storage otherwise you directly load from PVC.

That's exactly what I am trying to do.

And yes, I have encountered Disk Pressure issues as well when downloading larger models from an S3 bucket, having that download go directly to a PVC would help even if done out-of-scope from this local PVC caching functionality.

In many of the clusters I am deploying to, I have high-speed shared storage backing this PVC. So I am able to download from the Internet (or a local in-cluster object store) once, cache it onto the high-speed storage, and then mount that same PV to all similar pods launching the model or in the same auto-scaling group. Not having to download each time helps with the startup times and supports some of the air-gapped environments I am working in better.

In the workflow I imagine, I anticipate there may be use cases in a cluster where RW and where RO make sense. Having the setting make exist globally would be better for me, but having it in the spec makes the most sense in the workflows I have in mind. Essentially I expect that in some cases a user might expect the lazy local cache as described above and keep it RW, but in some cases they may be forced by security to only use what has been validated and must mount a RO volume.

@lizzzcai
Copy link
Member

do you think adding a configuration readOnly here will help here instead of adding a new parameter in the spec? so you can overwrite the value to False.

this option sounds like you can provide your own PVC, in theory you can configure it as RWX, right? Or am I missing something?

Hi @spolti , The PVC can be RWX but when it is mounted to main container, it is ready-only, so application can only read data from /mnt/models.

In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think? @yuzisun

Isn't it the purpose of the model-car as well?

from my understanding model-car is mainly for models in oci image, and it can be cached in the same node only.
The idea here can utilize the feature in storage initializer to download models from different storage (s3, gcs, az, hdfs etc) and if the PVC is a NFS, the model can be shared in multiple nodes as well.

@lizzzcai
Copy link
Member

Interesting idea, sounds like you want to do lazy model caching on local PVC, if the model does not exist on the local PVC then you download from the cloud storage otherwise you directly load from PVC.

Yes, this is something I want to achieve. And we should provide this option out of the emptyDir, especially for large model.

If the option of using PVC as the share volume can solve the issue, I will prefer to keep the pattern of storage initializer to write the data to the volume and the main container only read the data, as it is a good pratice. However I have no objection to provide an option as well, instead of having it in spec, how about using annotation like storage.kserve.io/readyOnly: false? @supertetelman

@supertetelman
Copy link
Author

Having an annotation like that in the InferenceService would be perfect for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants