Support ollama server #3647

skonto · 2024-04-29T10:02:29Z

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Support the ollama server as a runtime (not sure if it has been asked elsewhere).

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Ollama seems pretty fast in terms of cpu usage (also 4bits out of the box), or a good compromise between cpu/gpu. Vllm has some restrictions at the moment eg. avx512 prerequisite. It would be good to have options especially for running models locally (eg. dev phase) or more importantly on non-gpu clusters I think.

Links to the design documents:
N/A

skonto · 2024-04-29T10:21:09Z

cc @terrytangyuan @yuzisun wdyth?

nilayaishwarya · 2024-05-21T10:56:43Z

@skonto I am working on one right now. A rather simple one utilizing ollama client. Let me know if you have any suggestions for me.

oss-prow-bot bot added the kind/feature label Apr 29, 2024

yuzisun added the kserve/llm label May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ollama server #3647

Support ollama server #3647

skonto commented Apr 29, 2024 •

edited

skonto commented Apr 29, 2024 •

edited

nilayaishwarya commented May 21, 2024

Support ollama server #3647

Support ollama server #3647

Comments

skonto commented Apr 29, 2024 • edited

skonto commented Apr 29, 2024 • edited

nilayaishwarya commented May 21, 2024

skonto commented Apr 29, 2024 •

edited

skonto commented Apr 29, 2024 •

edited