-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow PVC Model Mount in ReadWrite Mode #3687
Comments
do you think adding a configuration |
In addition, current KServe using emptyDir as share volume between Storage initializer and main container, if we have option to configure a PVC to it (and storage initializer support skip the model download if model already exists), in this case model will be only downloaded once. Do you think it will be helpful as well? what do you think? @yuzisun |
this option sounds like you can provide your own PVC, in theory you can configure it as RWX, right? Or am I missing something?
Isn't it the purpose of the model-car as well? |
I think this would be very useful to have. We recently just had OOM issue because of this. |
How did you get OOM ? |
When we start having more replicas, it creates disk pressure on the node and also the pods are killed due to OOM killed as finite amount of RAM is allocated to emptyDir volume. |
EmptyDir should by default use ephemeral storage so if you do not set the resource limit then it can potentially create the disk pressure issue. |
Interesting idea, sounds like you want to do lazy model caching on local PVC, if the model does not exist on the local PVC then you download from the cloud storage otherwise you directly load from PVC. |
That's exactly what I am trying to do. And yes, I have encountered Disk Pressure issues as well when downloading larger models from an S3 bucket, having that download go directly to a PVC would help even if done out-of-scope from this local PVC caching functionality. In many of the clusters I am deploying to, I have high-speed shared storage backing this PVC. So I am able to download from the Internet (or a local in-cluster object store) once, cache it onto the high-speed storage, and then mount that same PV to all similar pods launching the model or in the same auto-scaling group. Not having to download each time helps with the startup times and supports some of the air-gapped environments I am working in better. In the workflow I imagine, I anticipate there may be use cases in a cluster where RW and where RO make sense. Having the setting make exist globally would be better for me, but having it in the spec makes the most sense in the workflows I have in mind. Essentially I expect that in some cases a user might expect the lazy local cache as described above and keep it RW, but in some cases they may be forced by security to only use what has been validated and must mount a RO volume. |
Hi @spolti , The PVC can be RWX but when it is mounted to main container, it is ready-only, so application can only read data from
from my understanding model-car is mainly for models in oci image, and it can be cached in the same node only. |
Yes, this is something I want to achieve. And we should provide this option out of the emptyDir, especially for large model. If the option of using PVC as the share volume can solve the issue, I will prefer to keep the pattern of storage initializer to write the data to the volume and the main container only read the data, as it is a good pratice. However I have no objection to provide an option as well, instead of having it in spec, how about using annotation like |
Having an annotation like that in the |
/kind feature
I would like to add an additional flag to specify a volume to be ReadWrite or ReadOnly.
Currently the implementation for PVC model stores here expects that a PV containing model files is always mounted to a
ServingRuntime
asReadOnly
. I have a use case where my server has some internal logic that I would like to dynamically download model components and cache them to the local PV.As an alternative, this could be a use case for the new model-car, but my download logic is somewhat complex and I need to run the logic in many locations outside of KServe, so keeping the logic inside the container works best for me. This may also be a use case for a custom StorageContainer, but I think the change is small enough and common enough that other users might find this feature useful.
Design proposal:
Add a readonly flag under storageUri:
See Slack for more details: https://cloud-native.slack.com/archives/C06AH2C3K8B/p1714883766854429
The text was updated successfully, but these errors were encountered: