Skip to content

A fast CPU-based API for OpenChat 3.6 using CTranslate2, hosted on Hugging Face Spaces.

Notifications You must be signed in to change notification settings

winstxnhdw/llm-api

Repository files navigation

llm-api

linting: pylint deploy.yml build.yml formatter.yml warmer.yml dependabot.yml

Open in Spaces Open a Pull Request

A fast CPU-based API for OpenChat 3.6, hosted on Hugging Face Spaces. To achieve faster executions, we are using CTranslate2 as our inference engine.

Usage

Simply cURL the endpoint like in the following.

curl -N 'https://winstxnhdw-llm-api.hf.space/api/v1/generate' \
     -H 'Content-Type: application/json' \
     -d \
     '{
         "instruction": "What is the capital of Japan?"
      }'

Development

First, install the required dependencies for your editor with the following.

poetry install

Now, you can access the Swagger UI at localhost:7860/api/docs after spinning the server up locally with the following.

docker build -f Dockerfile.build -t llm-api .
docker run --rm -e APP_PORT=7860 -p 7860:7860 llm-api