Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how ratio in fuzzy-wuzzy calculated? #289

Open
fatimamb opened this issue Nov 9, 2020 · 1 comment
Open

how ratio in fuzzy-wuzzy calculated? #289

fatimamb opened this issue Nov 9, 2020 · 1 comment

Comments

@fatimamb
Copy link

fatimamb commented Nov 9, 2020

I am trying to understand the score in fuzzy-wuzzy calculated.
so for now I know it depends on SequenceMatcher from difflib package.
and as shown in difflib document the score calculated as this link:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T.
 Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.

but my first question what 2.0 referred to?

also, in get_opcodes, there is equal, replace and delete.

s = SequenceMatcher("private","privateT")
    for opcode in s.get_opcodes():
          print "%6s a[%d:%d] b[%d:%d]" % opcode

my second question does any of them affect the ratio score?

I had read some posts as here
taking about the cost in edit distance,
is that consider in fuzzy-wuzzy or difflib score?

thank you

@fatimamb fatimamb changed the title how ratio in uzzy-wuzzy calculated? how ratio in fuzzy-wuzzy calculated? Nov 9, 2020
@MahmoudAliEng
Copy link

As far as I know that FW uses the Levenshtein similarity ratio. You can find more explanation about its logic in this amazing article.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants