😺 ASR Dysarthria

Description

Automatic speech recognition for people with dysarthria

Model is hosted on Hugging Face: https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria

Installation

Pip

# clone project
git clone https://github.com/jmaczan/asr-dysarthria
cd asr-dysarthria

# [OPTIONAL] create conda environment
conda create -n asr-dysarthria python=3.9
conda activate asr-dysarthria

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/jmaczan/asr-dysarthria
cd asr-dysarthria

# create conda environment and install dependencies
conda env create -f environment.yaml -n asr-dysarthria

# activate conda environment
conda activate asr-dysarthria

How to run

Train model with default configuration

# train on CPU
python src/train.py trainer=cpu

# train on GPU
python src/train.py trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python src/train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

python src/train.py trainer.max_epochs=20 data.batch_size=64

Building the TORGO dataset

Run python oldies/src/torgo_dataset_builder.py

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

Dataset is hosted on Hugging Face: https://huggingface.co/datasets/jmaczan/TORGO

Open questions

How to do audio data augmentation for TORGO dataset?
How to obtain Nemours database of dysarthric speech? https://ieeexplore.ieee.org/document/608020

Resources

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

License

MIT License

Author

Made in Kaszëbë, Poland 🇵🇱 by Jędrzej Paweł Maczan, on the shoulders of giants in 2024

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
configs		configs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

jmaczan/asr-dysarthria

Folders and files

Latest commit

History

Repository files navigation

😺 ASR Dysarthria

Description

Installation

Pip

Conda

How to run

Building the TORGO dataset

Open questions

Resources

Papers

Code

Data

Dataset

Big

Small

Others

License

Author

About

Topics

Resources

Stars

Watchers

Forks

Languages