The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
-
Updated
Jun 3, 2024 - Python
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
The open-source serverless GPU container runtime.
A high-performance inference system for large language models, designed for production environments.
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Run generative AI models in sophgo BM1684X
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
llmon-py is a multimodal webui for Llama 3-8B.
Semantic embedding-based system for question answering from PDFs with visual analysis tools.
App that generates MCQs from images and PDFs
开源的智能体项目 支持6种聊天平台 Onebotv11一对多连接 流式信息 agent 对话keyboard气泡生成 支持6种大模型接口(持续增加中) 具有将多种大模型接口转化为带有上下文的通用格式的能力.
Library to supercharge your use of large language models
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
Java utility library, contain many feature, support to Large Language Model inference with LLaMA. Face Detection with OpenCV, Face Recognition with Python....and more
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."