How to get fuzzy index? #293

mridu-enigma · 2020-12-13T04:50:56Z

I am using fuzzywuzzy to look for key-phrase like terms in corpuses.

FWIW, when there's a tie I'd like the tie to be broken by earliest match, so: is there a way to get the fuzzy-index of a match? I tried all functions in fuzz and process (using dir() to discover funcs like QWRatio, etc.)

For instance, I want some mechanism that ranks fuzz.partial_ratio('alex', 'alexa not') higher than fuzz.partial_ratio('alex', 'not alexa'), but for fuzzy matches (that's a simplistic example). How can I achieve this?

The text was updated successfully, but these errors were encountered:

acslater00 · 2020-12-13T06:00:07Z

I haven't thought about it deeply, but I think it's an ambiguous problem definition. You probably need a minimum threshold that you would consider a match before being able to implement. If you did that, a naive solution would be to just iterate the string and return the first index with a fuzzy-threshold matching your search key. There might be other more optimized solutions that would scale to longer strings. Hope this helps.

s-c-p · 2020-12-14T11:20:20Z

@mridu-enigma I don't understand it.

You want index of first token match or index where max similarity (density) is observed?

Say you are searching for o ordin in strings like co-ordinate, what output do you seek? 1 or 3?

@acslater00 I haven't grokked the entire code, perhaps you could shed some light of correct behavior.

maxbachmann · 2020-12-16T15:32:16Z

fuzz.partial_ratio searches for the best alignment of the shorter string to the longer string. It does not matter which way you insert them in as long as they do not have a similar length (for similar lengths the results can differ)
In the code this is achieved by

    if len(s1) <= len(s2):
        shorter = s1
        longer = s2
    else:
        shorter = s2
        longer = s1

Since you mention that you match a key against a phrase I assume that the key always has to be shorter than the phrase, so you might be able to implement this the following way:

def your_scorer(s1, s2):
  if len(s1) > len(s2):
    return 0
  return fuzz.partial_ratio(s1, s2)

s-c-p mentioned this issue Dec 13, 2020

Find term position #294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get fuzzy index? #293

How to get fuzzy index? #293

mridu-enigma commented Dec 13, 2020

acslater00 commented Dec 13, 2020

s-c-p commented Dec 14, 2020 •

edited

maxbachmann commented Dec 16, 2020 •

edited

How to get fuzzy index? #293

How to get fuzzy index? #293

Comments

mridu-enigma commented Dec 13, 2020

acslater00 commented Dec 13, 2020

s-c-p commented Dec 14, 2020 • edited

maxbachmann commented Dec 16, 2020 • edited

s-c-p commented Dec 14, 2020 •

edited

maxbachmann commented Dec 16, 2020 •

edited