Skip to content

autonompost/whisper-autotranscription

Repository files navigation

Autotranscription with Whisper

This will let you bulk transcribe audio files using a cloud provider of your choice. The project is using terraform to create a number of instances and uses ansible to configure and transcribe the files in parallel using whisper.

You should really use a cloud provider which supports GPU's. Even on instances with 16 CPU's the transcribe process is horribly slow

of course you can use a service like replicate, I will have to see what costs like a bulk transcripts would cost on replicate and than compare it

Also, some general remarks can be found here testing.

General Setup Steps

This project has been testing with the following versions:

  • Terraform 1.5.7
  • Ansible 2.16.0 (9.0.x)
  • Python 3.11.6
  • openstack client 6.0.0

In order to use this project, first create your config files as described in the section below.

Usage: ./whisper-autotranscription.sh [-f CONFIGFILE] [-n NUMBER VMS] [-m MODE] [-h]
  -f CONFIGFILE Specify a config file (optional. will use config/config.sh if not specified))
  -n NUMVMS     Specify a number of VMS to create (optional. will use 1 if not specified))
  -m MODE       Specify the mode whisper|whisperx (optional. will use whisper if not specified)
  -h            Display this help message

whisperX Usage (Not recommended yet)

There is also the possibility to use whisperX instead of whisper.

For this you need a huggingface account and a token.

You will also need to accept the Terms and Conditions for speaker-diarization and segmentation.

whisperX only works with .wav files currently because of a bug in python-soundfile

Copy config/ansible_secrets.yaml_example to config/ansible_secrets.yaml and add you token to config/ansible_secrets.yaml.

Files

Files that need to be processed need to be put in files_upload directory $SRC_DIR. After the transcription the files will first be downloaded to files_download directory $DST_DIR and then copied to the originating directories in files_upload or $SRC_DIR.

If the variable CLEANUP is set to true, the files in files_download will be deleted.

config.sh

The file config/config.sh_example needs to be copied over to config/config.sh

cp config/config.sh_example config/config.sh

Adjust the values according to your needs.

Terraform Variables

The terraform tfvars file config/variables.tfvars_example needs to be copied over to config/variables.tfvars

cp config/variables.tfvars_example config/variables.tfvars

Adjust the values according to your needs.

Ansible Variables

The file templates/ansible_vars.yaml_example needs to be copied over to templates/ansible_vars.yaml.

cp templates/ansible_vars.yaml_example templates/ansible_vars.yaml

In the templates/ansible_vars.yaml file the model size can be set and also the path to download the files to. This needs to be the same as the DST_DIR from the config/config.sh.

DO NOT CHANGE the variable for THREADS, this is done in the whisper-autotranscription.sh script which will get the value according to instance_type

Change whisper_parameters if you want to optimize your whisper settings.

instance_threads: THREADS
whisper_model_size: "medium"
whisper_retry_count: 3
whisper_retry_delay: 10
file_directory: "/pathto/whisper-autotranscription/files_download"
whisper_parameters: "--language de --extend_duration 0.1"

If you are unsure what parameters for whisper exist, install whisper on a system and execute whisper --help.

secrets.sh

The file config/secrets.sh_example needs to be copied over to config/secrets.sh.

cp config/secrets.sh_example config/secrets.sh

Edit the file and add the API Token(s) of your Cloud Provider

DO_TOKEN=
HCLOUD_TOKEN=
LINODE_TOKEN=
OVH_APPLICATION_KEY=
OVH_APPLICATION_SECRET=
OVH_CONSUMER_KEY=

For GCP use gcloud auth login in order to use terraform.

Cloud Provider Specific Instructions

Not full tested cloud providers

Roadmap

Version 1

  • Provision multiple VMs for parallel processing
  • Supported Cloud Providers
    • Hetzner Cloud (mostly used for testing)
    • OVH (GPU)
    • GCP (GPU) (using spot instances)
  • Use OpenAI Whisper
  • Upload/Download files from/to local filsystem
  • Autodetect language
  • GPU instance support with Nvidia Cuda
  • use whisperX instead of whisper

Version 1.1

  • more CLI script parameters to reduce the config file mess
  • option for maximum number of files to transcribe

Version 2

  • Obsidian audio-notes plugin support
  • automatic translation with DeepL to a specified language for transcripts
  • upload only files from files_upload that have not been transcribed
  • use rclone directly on the remote system without any local files
  • automatically create summaries for transcripts
  • whisperX diarization (currently not so great)
  • Speaker Identification
  • Supported Cloud Providers
    • Azure (GPU) (wont implement - feel free to fork)
    • Linode (GPU) (not yet fully tested since I did not get any GPU instance access) (wont implement - feel free to fork)
    • AWS (GPU) (wont implement - feel free to fork)

Version 3

  • Use DeepL Write API to automatically correct grammar
  • Create Cloud Images with Packer for faster deployment

Contributing

Feel free to fork and open up a pull request either to fix errors or add functionality.