model-serving

Star

Here are 129 public repositories matching this topic...

openvinotoolkit / model_server

Star

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated May 31, 2024
C++

bentoml / BentoML

Star

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 31, 2024
Python

vllm-project / vllm

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving mlops llm inferentia llmops llm-serving trainium

Updated May 31, 2024
Python

intel / xFasterTransformer

Star

intel inference transformer xeon llama model-serving llm chatglm qwen

Updated May 31, 2024
C++

instill-ai / console

Star

⛅ Versatile Data Pipeline (VDP) console website

console ui computer-vision deep-learning frontend image-classification object-detection structured-data hacktoberfest data-pipeline no-code model-serving vdp unstructured-data data-connector vision-ai versatile-data-pipeline

Updated May 31, 2024
TypeScript

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated May 31, 2024
Python

instill-ai / .github

Star

🏡 Instill AI organisation profile and default configuration

computer-vision deep-learning api-first low-code model-serving vdp unstructured-data mlops vision-ai versatile-data-pipeline

Updated May 31, 2024

basetenlabs / truss

Star

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

Updated May 30, 2024
Python

kserve / kserve

Star

Standardized Serverless ML Inference Platform on Kubernetes

Updated May 31, 2024
Python

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

python kubernetes workflow data-science machine-learning data-engineering model-serving mlops experiment-tracking mlops-workflow

Updated May 31, 2024
Python

google / jetstream-pytorch

Star

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated May 31, 2024
Python

jozu-ai / kitops

Star

Tools for easing the handoff between AI/ML and App/SRE teams.

kubernetes devops ai code tensorflow sklearn models ml pytorch datasets devops-tools kubernetes-deployment model-serving mlops model-packer model-interpretability gguf mlops-tools

Updated May 30, 2024
Go

FedML-AI / FedML

Star

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

machine-learning deep-learning inference-engine model-deployment model-serving distributed-training federated-learning mlops edge-ai ai-agent on-device-training

Updated May 31, 2024
Python

bentoml / OneDiffusion

Star

OneDiffusion: Run any Stable Diffusion models and fine-tuned weights with ease

kubernetes ai lora model-serving fine-tuning diffusion-models stable-diffusion

Updated May 30, 2024
Python

ModelTC / lightllm

Star

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

nlp deep-learning llama gpt model-serving llm openai-triton