whisply 🗿

Transcribe, translate, diarize, annotate and subtitle audio and video files with Whisper ... fast!

whisply combines faster-whisper, insanely-fast-whisper and batch processing of files (with mixed languages). It also enables speaker detection and annotation via pyannote.

Supported output formats: .json .txt .srt .rttm

Requirements

FFmpeg
python3.11

If you want to use a GPU:

nvidia GPU (CUDA)
Metal Performance Shaders (MPS) → Mac M1-M3

If you want to use speaker detection / diarization:

HuggingFace Access Token

Installation

1. Install ffmpeg

--- macOS ---
brew install ffmpeg

--- Linux ---
sudo apt-get update
sudo apt-get install ffmpeg

--- Windows ----
https://ffmpeg.org/download.html

2. Clone this repository and change to project folder

git clone https://github.com/th-schmidt/whisply.git
cd whisply

3. Create a Python virtual environment

python3.11 -m venv venv

4. Activate the Python virtual environment

source venv/bin/activate

5. Install whisply with pip

pip install .

Usage

>>> whisply --help
Usage: whisply [OPTIONS]

  WHISPLY 🗿 Transcribe, translate, diarize, annotate and subtitle audio and
  video files with Whisper ... fast!

Options:
  --files PATH            Path to file, folder, URL or .list to process.
  --output_dir DIRECTORY  Folder where transcripts should be saved. Default:
                          "./transcriptions"
  --device [cpu|gpu|mps]  Select the computation device: CPU, GPU (nvidia
                          CUDA), or MPS (Metal Performance Shaders).
  --lang TEXT             Specifies the language of the file your providing
                          (en, de, fr ...). Default: auto-detection)
  --detect_speakers       Enable speaker diarization to identify and separate
                          different speakers. Creates .rttm file.
  --hf_token TEXT         HuggingFace Access token required for speaker
                          diarization.
  --translate             Translate transcription to English.
  --srt                   Create .srt subtitles from the transcription.
  --txt                   Create .txt with the transcription.
  --config FILE           Path to configuration file.
  --list_formats          List supported audio and video formats.
  --verbose               Print text chunks during transcription.
  --help                  Show this message and exit.

Speaker Detection

To use the --detect_speakers option, you need to provide a valid HuggingFace access token by using the --hf_token option. Additionally, you must accept the terms and conditions for both version 3.0 and version 3.1 of the pyannote segmentation model. For detailed instructions, refer to the Requirements section on the pyannote model page on HuggingFace.

Using config files

You can provide a .json config file by using the --config which makes processing more user-friendly. An example config looks like this:

{
    "files": "path/to/files",
    "output_dir": "./transcriptions",
    "device": "cpu",
    "lang": null, 
    "detect_speakers": false,
    "hf_token": "Hugging Face Access Token",
    "translate": true,
    "txt": true,
    "srt": false,
    "verbose": true
}

Batch processing

Instead of providing a file, folder or URL by using the --files option, you can pass a .list with a mix of files, folders and URLs for processing. Example:

cat my_files.list

video_01.mp4
video_02.mp4
./my_files/
https://youtu.be/KtOayYXEsN4?si=-0MS6KXbEWXA7dqo

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
examples		examples
whisply		whisply
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

whisply

whisply

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

setup.py

setup.py

Repository files navigation

whisply 🗿

Table of contents

Requirements

Installation

Usage

Speaker Detection

Using config files

Batch processing

About

Releases

Packages

Contributors 2

Languages

License

tsykx/whisply

Folders and files

Latest commit

History

Repository files navigation

whisply 🗿

Table of contents

Requirements

Installation

Usage

Speaker Detection

Using config files

Batch processing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages