TUM-Live-Voice-Service

Microservice that generates subtitles for TUM-Live.

Workflow

                     ┌──────────┐
      ┌──────────────┤  WORKER  │
      │              └──────────┘
      │
      │
      │ 1) .NotifyUploadFinished
      │
      │
      │
      │
┌─────▼──────┐       2) .Generate         ┌───────────┐
│            ├────────────────────────────►           │
│            │                            │   VOICE   │
│  TUM-LIVE  │                            │  SERVICE  │
│            │                            │           │
├────────────◄────────────────────────────┼───────────┤
│  RECEIVER  │        3) .Receive         │ GENERATOR │
└────────────┘                            └───────────┘

API

$ grpcurl -plaintext localhost:50055 list live.voice.v1.SubtitleGenerator

live.voice.v1.SubtitleGenerator.Generate

$ grpcurl -plaintext \
  -d '{"stream_id":1, "source_file":"/tmp/120.mp4"}' \
  -import-path ./protobufs -proto subtitles.proto \
  localhost:50055 live.voice.v1.SubtitleGenerator.Generate

Installation

Python virtual environment

$ git clone https://github.com/TUM-Dev/TUM-Live-Voice-Service.git
Cloning into 'TUM-Live-Voice-Service'...
$ cd TUM-Live-Voice-Service
$ python -m venv venv
$ source venv/bin/activate
(venv) $ pip install --no-cache-dir -r requirements.txt 
(venv) $ DEBUG=. CONFIG_FILE=./config.yml python3.9 subtitles/subtitles.py
...

Or simply use an open source IDE like PyCharm CE.

Docker

$ docker run -p 50055:50055 \
  --name voice-service \
  -v /srv/static:/mass \
  -e CONFIG_FILE=./config.yml \
  -e DEBUG=.\
  -d \
  ghcr.io/tum-dev/tum-live-voice-service:latest

To make use of Nvidia hardware acceleration build the Dockerfile.nvidia:

$ docker build -t tum-live-voice-service-nvidia --file Dockerfile.nvidia .
$ docker run -p 50055:50055 \
  --name voice-service \
  --gpus all \ 
  -v /srv/static:/mass \
  -e CONFIG_FILE=./config.yml \
  -e DEBUG=.\
  -d \
  tum-live-voice-service-nvidia:latest

Configuration

You can configure the application with:

YAML file
.env file and environment variables

Configuration precedence (> = overwrites): environment > .env > .yml

Examplary .env file

API_PORT=51000
REC_HOST=127.0.0.1
REC_PORT=51001
VOSK_MODEL_DIR=/data
VOSK_DWNLD_URLS=https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip,https://alphacephei.com/vosk/models/vosk-model-small-de-0.15.zip
VOSK_MODELS=model-fr:fr,model-en:en
WHISPER_MODEL=medium
MAX_THREADS=10
CNT_WORKERS=3

Examplary YAML file

api:
  port: 50055
receiver:
  host: localhost
  port: 50053
transcriber: 'whisper'
vosk:
  model_dir: '/data'
  download_urls:
    - https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
    - https://alphacephei.com/vosk/models/vosk-model-small-de-0.15.zip
  models:
    - name: 'vosk-model-small-en-us-0.15'
      lang: 'en'
    - name: 'data/vosk-model-small-de-0.15'
      lang: 'de'
whisper:
  model: 'tiny'
max_threads: 12
cnt_workers: 3

Transcribers

Currently following transcribers are implemented and can be specified in the configuration:

whisper
vosk

Which transcriber one chooses depends immensely on the use case and computing power available. We found that whisper produces much higher quality results, especially regarding punctuation, but is much more compute-heavy.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
mock_receiver		mock_receiver
protobufs		protobufs
subtitles		subtitles
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.nvidia		Dockerfile.nvidia
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yml		config.yml
requirements.txt		requirements.txt

License

TUM-Dev/TUM-Live-Voice-Service

Folders and files

Latest commit

History

Repository files navigation

TUM-Live-Voice-Service

Workflow

API

Installation

Python virtual environment

Docker

Configuration

Transcribers

License

About

Resources

License

Stars

Watchers

Forks

Languages