You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Support the ollama server as a runtime (not sure if it has been asked elsewhere).
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Ollama seems pretty fast in terms of cpu usage (also 4bits out of the box), or a good compromise between cpu/gpu. Vllm has some restrictions at the moment eg. avx512 prerequisite. It would be good to have options especially for running models locally (eg. dev phase) or more importantly on non-gpu clusters I think.
Links to the design documents:
N/A
The text was updated successfully, but these errors were encountered:
/kind feature
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Support the ollama server as a runtime (not sure if it has been asked elsewhere).
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Ollama seems pretty fast in terms of cpu usage (also 4bits out of the box), or a good compromise between cpu/gpu. Vllm has some restrictions at the moment eg. avx512 prerequisite. It would be good to have options especially for running models locally (eg. dev phase) or more importantly on non-gpu clusters I think.
Links to the design documents:
N/A
The text was updated successfully, but these errors were encountered: