Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file #205

Open
aybberrada opened this issue May 11, 2022 · 1 comment

Comments

@aybberrada
Copy link

aybberrada commented May 11, 2022

i'm interested in using the kenlm LM to decode/score outputs of my speech recognition model.

when I initiate my CTCBeamDecoder with model_path='./test.arpa', which is a pretty small .arpa file just for testing, ~4kb, i encounter no problem and CTCBeamDecoder.decode() outputs with no issue at all.

but when I try using the correct .arpa file for my project ( 3-gram.pruned.1e-7.arpa.gz ) which is ~90mb, it either crashes instantly or takes forever and doesn't output anything. I built a .binary file for this .arpa file to use it , but I encounter the same problem.

i tracked the problem and the issue is in ctc_decode.paddle_beam_decode_lm

is it simply because it requires a LOT of RAM to do inference with a big .arpa file ? (i got 8gb)
if it's the case how much ram i need to do inference with such file?

@aybberrada aybberrada changed the title problem with CTCBeamDecoder.decode() when using a big LM file (.arpa / .binary) problem with CTCBeamDecoder.decode() when using a big file (.arpa / .binary) file May 11, 2022
@aybberrada aybberrada changed the title problem with CTCBeamDecoder.decode() when using a big file (.arpa / .binary) file problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file May 11, 2022
@afmsaif
Copy link

afmsaif commented Feb 25, 2023

I am facing the same problem. Have you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants