Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle completely wrong sentence word? #65

Open
AnandDev8 opened this issue May 3, 2019 · 1 comment
Open

How to handle completely wrong sentence word? #65

AnandDev8 opened this issue May 3, 2019 · 1 comment

Comments

@AnandDev8
Copy link

Hi,
First of all SymSpell is damn fast and kind of does my job for spell correction but the issue I am facing is when my application user intentionally type any completely wrong word or sentence Symspell would would come up with a right word for it which can be avoided
Example
User types: avedoamlkejuike...
Syspell: a video am like juice keen...
Something like this which is totally irrelvant for my usecase
So how can I solve this just by using Symspell??
Thanks in advance

@wolfgarbe
Copy link
Owner

wolfgarbe commented May 3, 2019

This is a common problem. It is not only that the user intentionally types something wrong, but there are always unknown words in the input text that are not in the dictionary.

So how we can distinguish a non-existing or unknown word from a misspelled word?

Solution1: restrict the maximum edit distance (the number of splits + the numer of spelling corrections) within a sliding text window. If a string needs to be split into too many small words, and almost all of the sub-words need an additional spelling correction we can assume that this is an unknown/non-existing word.

Solution2: Use n-gram probabilities or Markov-chains. The co-occurrence of words is not random. Some words are more likely to occur together in a sentence than others, some words are frequently follow each others, others never. if the n-gram probabilities of the split and corrected words are below a certain threshold, we can assume that this is not a genuine correction, but an unknown/non-existing word.

Both solutions are currently not part of SymSpell and need to be implemented as an extension or by modifying the SymSpell code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants