Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_set_token_ratio now keeps tokenization. #300

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

_set_token_ratio now keeps tokenization. #300

wants to merge 2 commits into from

Commits on Feb 21, 2021

  1. Modified Set Scoring

    Previous issue: partial_token_set_ratio matching strings across tokens.
    
    Fix: Preserve tokenization of the comparison sets and use Levenshtein's setratio/seqratio over ratio.
    
    Detail:
    Previously token_set_ratio used python's strip to remove white space. Since strip removes all whitespace, the set comparisons are not tokenized. So when using partial_token_set_ratio, you would be able to match strings across word boundaries. This is generally unexpected behavior. This change should allow a more bag of words.
    MWLever committed Feb 21, 2021
    Configuration menu
    Copy the full SHA
    f006dfa View commit details
    Browse the repository at this point in the history
  2. make pep8 compliant

    MWLever committed Feb 21, 2021
    Configuration menu
    Copy the full SHA
    ee940a7 View commit details
    Browse the repository at this point in the history