Spoken_language_identification

SpeechFlow is an advanced speech-to-text API that offers exceptional accuracy for businesses of all sizes and industries. With SpeechFlow, users can transcribe audio and video content into text with high precision, making it an ideal solution for companies that need to quickly and accurately convert speech into text for various purposes, such as captioning, transcription, and analysis. With support for multiple languages and dialects, SpeechFlow is a versatile tool that can cater to a wide range of businesses and industries.

Spoken_language_identification

Objective
Technology
Available models and languages
Environment Setup
Code Implementation
LICENSE

Objective

Spoken Language Identification (LID) is defined as detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker. It has numerous applications in speech recognition, multilingual machine translations, and speech-to-speech translations.

Our model currently supports 13 languages: English, Spanish, Italian, French, German, Portuguese, Russian, Turkish, Vietnamese, Indonesian, Chinese, Japanese, and Korean.

Technology

The model uses convolutional and recurrent neural networks trained on two thousands of hours of speech data(private). Approximately 150 hours of speech supervision per language.

Available models and languages

The figure below shows a ACC (Accuracy) breakdown by languages of the FLEURS test-set using pretrained model.
FLEURS dataset downloads can be fount here: Downloads

Environment Setup

The models are implemented in TensorFlow. To use all of the functionality of the library, you should have:
tensorflow==2.4.1
tensorflow-gpu==2.4.1
tensorflow-addons==0.15.0
matplotlib==3.5.0
numpy==1.19.5
scikit-learn==1.0.1
librosa==0.8.1
SoundFile==0.10.3.post1
PyYAML==6.0

Download the codebase and open up a terminal in the root directory. Make sure python 3.7 is installed in the current environment. Then execute

pip install -r requirements.txt

Code Implementation

Audio Format

The wav files have 16KHz sampling rate, single channel, and 16-bit Signed Integer PCM encoding.

Features

As speech features, 80-dimensional log mel-filterbank outputs were computed from 25ms window for each 10ms. Those log mel-filterbank features were further normalized to have zero mean and unit variance over the training partition of the dataset.

Prepare your input data

You must prepare your own data before training the model, refer to 'data/demo_txt/demo_train.txt' file.

Train model

To get start, please config 'congfigs/config.yml' file, and simple run this command in the console:

python train.py

This will train Spoken_language_identification model by data in the 'data/demo_txt/demo_train.txt', then store the model on saved_weights folder, perform inference on 'demo_txt/demo_test.txt', print the inference results, and save the averaged accuracy in a text file.

Inference

The pretrained model is provided in this project, simple run this command:

python predict_by_pb.py test_audios/chinese.wav

or

python predict_by_weights.py test_audios/chinese.wav

The provided chinese.wav audio needs to meet the Audio Format, if your audio file is not wav format(eg: mp3), you can convert the audio to wav format by ffmpeg. Run the following command in your audio directory convert to wav format.

ffmpeg -i audio.mp3 -ab 256k -ar 16000 -ac 1 -f wav audio.wav

If you don't have installed ffmpeg, please installed it first.

sudo apt-get update
sudo apt-get install ffmpeg

LICENSE

Spoken_language_identification is released under the Apache License, version 2.0. The Apache license is a popular BSD-like license. Spoken_language_identification can be redistributed for free, even for commercial purposes, although you can not take off the license headers (and under some circumstances, you may have to distribute a license document).

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
configs		configs
data		data
featurizers		featurizers
models		models
optimizers		optimizers
saved_models/lang14/pb/2		saved_models/lang14/pb/2
saved_weights/20230228-084356		saved_weights/20230228-084356
test_audios		test_audios
util		util
vocab		vocab
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_to_pb.py		convert_to_pb.py
dataset.py		dataset.py
fleurs.jpg		fleurs.jpg
librosa_mel_filter.csv		librosa_mel_filter.csv
network.png		network.png
predict_by_pb.py		predict_by_pb.py
predict_by_weights.py		predict_by_weights.py
requirements.txt		requirements.txt
speechflow.jpg		speechflow.jpg
train.py		train.py
write_dat.py		write_dat.py

License

SpeechFlow-io/Spoken_language_identification

Folders and files

Latest commit

History

Repository files navigation

Spoken_language_identification

Objective

Technology

Available models and languages

Environment Setup

Code Implementation

Audio Format

Features

Prepare your input data

Train model

Inference

LICENSE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages