Name		Name	Last commit message	Last commit date
parent directory ..
chart		chart
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
vllm-values.yaml		vllm-values.yaml
zarf.yaml		zarf.yaml

README.md

LeapfrogAI vLLM Backend

A LeapfrogAI API-compatible vLLM wrapper for quantized and un-quantized model inferencing across GPU infrastructures.

Usage

See instructions to get the backend up and running. Then, use the LeapfrogAI API server to interact with the backend.

Instructions

The instructions in this section assume the following:

Properly installed and configured Python 3.11.x, to include its development tools
The LeapfrogAI API server is deployed and running

The following are additional assumptions for GPU inferencing:

You have properly installed one or more NVIDIA GPUs and GPU drivers
You have properly installed and configured the cuda-toolkit and nvidia-container-toolkit

Model Selection

The default model that comes with this backend in this repository's officially released images is a 4-bit quantization of the Synthia-7b model.

You can optionally specify different models or quantization types using the following Docker build arguments:

--build-arg HF_HUB_ENABLE_HF_TRANSFER="1": Enable or disable HuggingFace Hub transfer (default: 1)
--build-arg REPO_ID="TheBloke/Synthia-7B-v2.0-GPTQ": HuggingFace repository ID for the model
--build-arg REVISION="gptq-4bit-32g-actorder_True": Revision or commit hash for the model
--build-arg QUANTIZATION="gptq": Quantization type (e.g., gptq, awq, or empty for un-quantized)
--build-arg TENSOR_PARALLEL_SIZE="1": The number of gpus to spread the tensor processing across

Run Locally

From this directory:

# Setup Virtual Environment
python -m venv .venv
source .venv/bin/activate

python -m pip install ../../src/leapfrogai_sdk
python -m pip install .

# To support Huggingface Hub model downloads
python -m pip install ".[dev]"

# Copy the environment variable file, change this if different params are needed
cp .env.example .env

# Make sure environment variables are set
source .env

# Clone Model
# Supply a REPO_ID, FILENAME and REVISION if a different model is desired
python src/model_download.py

mv .model/*.gguf .model/model.gguf

# Start Model Backend
python -m leapfrogai_sdk.cli --app-dir=src/ main:Model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm

vllm

chart

chart

src

src

.env.example

.env.example

.gitignore

.gitignore

Dockerfile

Dockerfile

Makefile

Makefile

README.md

README.md

pyproject.toml

pyproject.toml

vllm-values.yaml

vllm-values.yaml

zarf.yaml

zarf.yaml

README.md

LeapfrogAI vLLM Backend

Usage

Instructions

Model Selection

Run Locally

Files

vllm

Directory actions

More options

Directory actions

More options

Latest commit

History

vllm

Folders and files

parent directory

LeapfrogAI vLLM Backend

Usage

Instructions

Model Selection

Run Locally