Skip to content

atyenoria/livekit-whisper-transcribe

Repository files navigation

Description

This is the sample implementation of the asr websocket python server by using WebRTC livekit Egress. You can transcribe the speech audio file from livekit published audio track microphone per 30 seconds while saving the original audio file and resampled audio file that will be fed into faster_whisper.

Demo Video(Japanese Transcription)

Japanese.whisper.ASR.PoC.mp4

Preparation

Setup

  1. run the egress by "docker run --rm -e EGRESS_CONFIG_FILE=/out/config.yaml --net=host -v ~/egress-test:/out livekit/egress" after setting config.yaml
  2. Publish your audio track and check the audio track and room id from livekit console log
  3. update the request.json accoring to the result
  4. run "livekit-cli start-track-egress --api-key your-livekit-api-key --api-secret your-livekit-secret-key --request request.json" after starting "asr-server.py"
  5. you can see the transcription from the terminal and check the original and resampled audio file to detect the audio issues

the issues that I faced while debugging

  • Head part audio cut off issue when receiving the websocket audio streaming from egress
  • Cuda error on g5.xlarge instance (sudo apt install nvidia-driver-470 worked)
  • Resample from "pcm 16bit 48Khz 2channel" to "pcm 16bit 16Khz 1channel"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages