Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing python-Levenshtein as suggested by the warnings gives different results. #318

Open
JeremyThiesen opened this issue Jul 22, 2021 · 1 comment

Comments

@JeremyThiesen
Copy link

JeremyThiesen commented Jul 22, 2021

I was running this code:

from fuzzywuzzy import fuzz
partial_ratio = fuzz.partial_ratio('more than fifty', 'i know that because a lion run fifty mile per hour and a cheetah run about eighty mile per hour and sixty-five be more than fifty and be slow than eighty')
print (partial_ratio)

At fuzzywuzzy version 0.18.0, it gives the answer of 100. It also gives the following user warning.

UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

Installing python-Levenshtein at version 0.12.2, then gives the result answer of 87 for the preceeding code block, which is incorrect since there is an exact match.

@maxbachmann
Copy link

This issue has already been reported: #79
The implementation in python-Levenshtein provides incorrect results in some cases. So you can:

  1. use the slower difflib based version (and possibly suppress the warning)
  2. use the python-Levenshtein version which can provide incorrect results for any ratio which uses partial_ratio
  3. use RapidFuzz (I am the author) which provides a fast implementation providing similar results to the difflib based implementation

It would be possible to fix this behavior for fuzzywuzzy/python-Levenshtein. However since both projects are not really maintained anymore it is unclear if/when this will be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants