Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Multiple Parallel/Concatenatable Models #2306

Open
mo-g opened this issue Oct 31, 2022 · 0 comments
Open

Feature request: Multiple Parallel/Concatenatable Models #2306

mo-g opened this issue Oct 31, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@mo-g
Copy link

mo-g commented Oct 31, 2022

Honestly, this software is a black box to me, so this may be an inherently unachievable concept, but I didn't see an open or closed issue of something similar so thought it was worth asking.

Is your feature request related to a problem? Please describe.
My project will need to understand given and last names from a very wide linguistic base - something that would be unachievably large to retrain every time a new one needs added.

Describe the solution you'd like
I'd like to be able to train multiple separate models, e.g. "generic terms", "team names", "english names", "polish names", "spanish names", "mandarin names" and be able to load these models in parallel so that the STT can inference complete phrases containing words from multiple sources. It allows chunking of training, as well as selectively loading models to potentially reduce computational load by only inferencing with the scope necessary for where the software is installed.

Describe alternatives you've considered
Loading parallel models and running the inference several times at once I guess would be the alternative? But I don't know how I would select which words to use from each output. I'm not massively experienced with this software yet. It would also presumably be less efficient than a single instance with multiple language models as the acoustic model and "boilerplate" would also then be duplicated.

Additional context
This is for an open source system to allow voice-controlled calling between users, by making statements such as "call [recipient]" or "[initiator] to [recipient]". I can see something similar was planned for the original DeepSpeech but never made it into code before the project ended.

@mo-g mo-g added the enhancement New feature or request label Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant