Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hotword boosting feature and lexicon based decoding #222

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Sushmitha-Deva
Copy link

@Sushmitha-Deva Sushmitha-Deva commented Nov 16, 2023

This pull request includes modifications that enable hotword boosting, lexicon-based decoding, and the build configuration for the ctcdecode library.

Hotword boosting:

  • This feature is supported for both character and wpe based (allowed characters a-z and an apostrophe) ASR labels.
  • Scoring logic for hotwords is inspired from the pyctcdecode package, where partial weight will be added to the path score containing the hotword tokens, and in case a complete hotword is not formed, it will reset the score to the original.
  • Tests for using this feature is added in this PR

Lexicon based decoding:

  • Lexicon based decoding ensures to penalize the path that's going to form an invalid word, thereby giving priority to paths containing valid spellings
  • During decoding, it compares the beam path with the lexicon FST and then applies penality for unknown path formation.
  • A constant negative value unk_score will be used as a penalty.
  • Results indicate a 90% reduction in spelling mistakes after applying this decoding. Increase in unk_score, will lead to a 100% reduction.
  • Create the lexicon FST using the build_fst tool . See tools/README.md for usage

@Sushmitha-Deva Sushmitha-Deva changed the title Support hotword boosting feature Support hotword boosting feature and lexicon based decoding Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant