llm-inference

Here are 417 public repositories matching this topic...

bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jun 3, 2024
Python

microsoft / autogen

Star

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated Jun 3, 2024
Jupyter Notebook

Lightning-AI / litgpt

Star

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Jun 3, 2024
Python

beam-cloud / beta9

Star

The open-source serverless GPU container runtime.

gpu distributed-computing cuda self-hosted fine-tuning ml-platform large-language-models llm generative-ai llm-inference

Updated Jun 3, 2024
Go

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 3, 2024
C++

google / JetStream

Star

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gpu inference pytorch transformer llama gpt gemma model-serving tpu jax mlops large-language-models llm llmops llm-inference llama2

Updated Jun 3, 2024
Python

sophgo / LLM-TPU

Star

Run generative AI models in sophgo BM1684X

llama llm generative-ai chatglm llm-inference qwen bm1684x

Updated Jun 3, 2024
Python

nomic-ai / gpt4all

Star

gpt4all: run open-source LLMs anywhere

llm-inference

Updated Jun 3, 2024
C++

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jun 3, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated Jun 3, 2024
Python

openvinotoolkit / openvino

Star

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Jun 3, 2024
C++

3eeps / llmon-py

Star

llmon-py is a multimodal webui for Llama 3-8B.

chatbot image-recognition image-generation webui llm-inference function-calling

Updated Jun 3, 2024
Python

AvaAvarai / EmbedQA

Star

Semantic embedding-based system for question answering from PDFs with visual analysis tools.

chatbot embeddings knowledge-base semantic-search bert-embeddings pdf-questions pdf-query llm-inference

Updated Jun 3, 2024
Python

Xyntopia / taskyon

Star

Browser based Interface for Generative AI. Chat/Agent/Taskmanager Hybrid.

gui ai gpt gpt-4 llm llm-agent llm-inference

Updated Jun 3, 2024
TypeScript

FSusman / IM

Star

App that generates MCQs from images and PDFs

react redux ocr-recognition llm-inference

Updated Jun 3, 2024
JavaScript

Hoshinonyaruko / Gensokyo-llm

Star

开源的智能体项目支持6种聊天平台 Onebotv11一对多连接流式信息 agent 对话keyboard气泡生成支持6种大模型接口(持续增加中) 具有将多种大模型接口转化为带有上下文的通用格式的能力.

chatbot qqbot ai-agents onebot onebot-plugin llm onebot11 llm-inference ai-agents-framework llm-api

Updated Jun 3, 2024
Go

webgptorg / promptbook

Star

Library to supercharge your use of large language models

openai autogpt llm-inference

Updated Jun 3, 2024
TypeScript

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Jun 3, 2024
Python

expectedparrot / edsl

Star

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

python open-source openai surveys experiments domain-specific-language market-research social-science synthetic-data data-labeling llm anthropic llm-agent llm-inference llama2 llm-framework mixtral deepinfra

Updated Jun 3, 2024
Python

AnLaVN / AL-Library

Star

Java utility library, contain many feature, support to Large Language Model inference with LLaMA. Face Detection with OpenCV, Face Recognition with Python....and more

java cryptography bcrypt smtp face-recognition face-detection md5-hash sha256-hash aes-encryption-decryption randomorg large-language-models llamacpp llm-inference

Updated Jun 3, 2024
Java

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-inference

Here are 417 public repositories matching this topic...

bentoml / BentoML

microsoft / autogen

Lightning-AI / litgpt

beam-cloud / beta9

vectorch-ai / ScaleLLM

google / JetStream

sophgo / LLM-TPU

nomic-ai / gpt4all

predibase / lorax

bentoml / OpenLLM

openvinotoolkit / openvino

3eeps / llmon-py

AvaAvarai / EmbedQA

Xyntopia / taskyon

FSusman / IM

Hoshinonyaruko / Gensokyo-llm

webgptorg / promptbook

InternLM / lmdeploy

expectedparrot / edsl

AnLaVN / AL-Library

Improve this page

Add this topic to your repo