_set_token_ratio now keeps tokenization. #300

Previous issue: partial_token_set_ratio matching strings across tokens. Fix: Preserve tokenization of the comparison sets and use Levenshtein's setratio/seqratio over ratio. Detail: Previously token_set_ratio used python's strip to remove white space. Since strip removes all whitespace, the set comparisons are not tokenized. So when using partial_token_set_ratio, you would be able to match strings across word boundaries. This is generally unexpected behavior. This change should allow a more bag of words.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_set_token_ratio now keeps tokenization. #300

_set_token_ratio now keeps tokenization. #300

Commits on Feb 21, 2021

_set_token_ratio now keeps tokenization. #300

Are you sure you want to change the base?

_set_token_ratio now keeps tokenization. #300

Commits on Feb 21, 2021