how ratio in fuzzy-wuzzy calculated? #289

fatimamb · 2020-11-09T15:24:15Z

I am trying to understand the score in fuzzy-wuzzy calculated.
so for now I know it depends on SequenceMatcher from difflib package.
and as shown in difflib document the score calculated as this link:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T.
 Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.

but my first question what 2.0 referred to?

also, in get_opcodes, there is equal, replace and delete.

s = SequenceMatcher("private","privateT")
    for opcode in s.get_opcodes():
          print "%6s a[%d:%d] b[%d:%d]" % opcode

my second question does any of them affect the ratio score?

I had read some posts as here
taking about the cost in edit distance,
is that consider in fuzzy-wuzzy or difflib score?

thank you

The text was updated successfully, but these errors were encountered:

MahmoudAliEng · 2020-12-13T14:37:13Z

As far as I know that FW uses the Levenshtein similarity ratio. You can find more explanation about its logic in this amazing article.

fatimamb changed the title ~~how ratio in uzzy-wuzzy calculated?~~ how ratio in fuzzy-wuzzy calculated? Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how ratio in fuzzy-wuzzy calculated? #289

how ratio in fuzzy-wuzzy calculated? #289

fatimamb commented Nov 9, 2020 •

edited

MahmoudAliEng commented Dec 13, 2020

how ratio in fuzzy-wuzzy calculated? #289

how ratio in fuzzy-wuzzy calculated? #289

Comments

fatimamb commented Nov 9, 2020 • edited

MahmoudAliEng commented Dec 13, 2020

fatimamb commented Nov 9, 2020 •

edited