Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to compare each and every row with every row in same column and delete matching rows with ratio > 90 #316

Open
nithinreddyy opened this issue Jun 29, 2021 · 0 comments

Comments

@nithinreddyy
Copy link

nithinreddyy commented Jun 29, 2021

How to compare each and every row with every row in same column and delete matching rows with ratio > 90

For example i have dataframe like

Pdf                         Content             Page no
July 20, 2017.PDF           Hello               24.0
July 20, 2017.PDF           Hi                  20.0
July 2, 2018.PDF            Hey                 21.0
July 2, 2018.PDF            Helloo              10.0
July 2, 2018.PDF            Hii                 11.0

I'm exptecting output like if the each and every matches with ration above 90, then the row must be removed and the expected output is

Pdf                         Content             Page no
July 20, 2017.PDF           Hello               24.0
July 20, 2017.PDF           Hi                  20.0
July 2, 2018.PDF            Hey                 21.0

I'm trying the below code, but it's just returning the matching ratio

compare = pd.MultiIndex.from_product([data['Content'],
                                      data['Content1']]).to_series()

def metrics(tup):
    return pd.Series([fuzz.ratio(*tup),
                      fuzz.token_sort_ratio(*tup)],
                     ['ratio', 'token'])

compare = compare.apply(metrics)

1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant