Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils.full_process executed when processor=None #319

Open
sdennler opened this issue Aug 1, 2021 · 1 comment
Open

utils.full_process executed when processor=None #319

sdennler opened this issue Aug 1, 2021 · 1 comment

Comments

@sdennler
Copy link

sdennler commented Aug 1, 2021

Great and very helpful tool! Thank you!

One thing I noticed is that even when process.extractOne (and others) have processor set to None, utils.full_process is still executed several times. Probably because of

pre_processor = partial(utils.full_process, force_ascii=True)

This generates two times the same output:

from fuzzywuzzy import process

query = "123   ....  "
choices = ["123", query]

print(process.extract(query, choices))
print(process.extract(query, choices, processor=None))

Output:

[('123', 100), ('123   ....  ', 100)]
[('123', 100), ('123   ....  ', 100)]

Expected would be that without a processor the 1:1 match is better. So some thing like this:

[('123', 100), ('123   ....  ', 100)]
[('123   ....  ', 100), ('123', 90)]
@maxbachmann
Copy link

In Fuzzywuzzy the processor argument only allows the usage of additional preprocessing. However, it does not provide a way to disable the preprocessing inside the scorer. So when calling

process.extract(query, choices, processor=None)

The string is still preprocessed, since the default scorer fuzz.WRatio preprocesses strings by default. To disable this you would have to use:

process.extract(query, choices, processor=None, scorer=partial(fuzz.WRatio, full_process=False))

I agree that this is very counter-intuitive, which is why I use the behavior you expected in RapidFuzz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants