problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file #205

aybberrada · 2022-05-11T20:56:38Z

i'm interested in using the kenlm LM to decode/score outputs of my speech recognition model.

when I initiate my CTCBeamDecoder with model_path='./test.arpa', which is a pretty small .arpa file just for testing, ~4kb, i encounter no problem and CTCBeamDecoder.decode() outputs with no issue at all.

but when I try using the correct .arpa file for my project ( 3-gram.pruned.1e-7.arpa.gz ) which is ~90mb, it either crashes instantly or takes forever and doesn't output anything. I built a .binary file for this .arpa file to use it , but I encounter the same problem.

i tracked the problem and the issue is in ctc_decode.paddle_beam_decode_lm

is it simply because it requires a LOT of RAM to do inference with a big .arpa file ? (i got 8gb)
if it's the case how much ram i need to do inference with such file?

afmsaif · 2023-02-25T21:46:09Z

I am facing the same problem. Have you solve it?

aybberrada changed the title ~~problem with CTCBeamDecoder.decode() when using a big LM file (.arpa / .binary)~~ problem with CTCBeamDecoder.decode() when using a big file (.arpa / .binary) file May 11, 2022

aybberrada changed the title ~~problem with CTCBeamDecoder.decode() when using a big file (.arpa / .binary) file~~ problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file #205

problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file #205

aybberrada commented May 11, 2022 •

edited

afmsaif commented Feb 25, 2023

problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file #205

problem with CTCBeamDecoder.decode() when using a big (.arpa / .binary) file #205

Comments

aybberrada commented May 11, 2022 • edited

afmsaif commented Feb 25, 2023

aybberrada commented May 11, 2022 •

edited