Skip to content

jmaczan/asr-dysarthria

Repository files navigation

😺 ASR Dysarthria

PyTorch Lightning Config: Hydra Template

Description

Automatic speech recognition for people with dysarthria

Model is hosted on Hugging Face: https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria

Installation

Pip

# clone project
git clone https://github.com/jmaczan/asr-dysarthria
cd asr-dysarthria

# [OPTIONAL] create conda environment
conda create -n asr-dysarthria python=3.9
conda activate asr-dysarthria

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/jmaczan/asr-dysarthria
cd asr-dysarthria

# create conda environment and install dependencies
conda env create -f environment.yaml -n asr-dysarthria

# activate conda environment
conda activate asr-dysarthria

How to run

Train model with default configuration

# train on CPU
python src/train.py trainer=cpu

# train on GPU
python src/train.py trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python src/train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

python src/train.py trainer.max_epochs=20 data.batch_size=64

Building the TORGO dataset

Run python oldies/src/torgo_dataset_builder.py

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

Dataset is hosted on Hugging Face: https://huggingface.co/datasets/jmaczan/TORGO

Open questions

Resources

Papers

https://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)

https://www.isca-speech.org/archive/pdfs/interspeech_2022/baskar22b_interspeech.pdf

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10225595

https://www.sciencedirect.com/science/article/pii/S2405959521000874

https://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf

https://arxiv.org/pdf/2006.11477.pdf

https://arxiv.org/pdf/2211.00089.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0957417423002981

Code

https://huggingface.co/blog/fine-tune-wav2vec2-english

Data

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

License

MIT License

Author

Made in Kaszëbë, Poland 🇵🇱 by Jędrzej Paweł Maczan, on the shoulders of giants in 2024