Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(logical): fix scoring for logical OR #593

Closed

Conversation

adlerfaulkner
Copy link

Change match selection in logical OR to select the best
scoring (lowest value) match instead of the first match. Currently,
a logical OR which fuzzy searches across multiple fields may
score incorrectly because it returns the score for the first matching
field, not the best.

Change match selection in logical OR to select the best
scoring (lowest value) match instead of the first match. Currently,
a logical OR which fuzzy searches across multiple fields may
score incorrectly because it returns the score for the first matching
field, not the best.
@adlerfaulkner
Copy link
Author

adlerfaulkner commented Dec 14, 2021

Hello!
Loving Fuse's ease of use and maintainability, thanks @krisk!

I noticed this scoring problem when using the solution for searching for multiple tokens across multiple keys by @0xdevalias on #235.

When fuzzy searching for the query wood using a logical OR across title and author.lastName...

{
      $or: [
        { title: 'wood' },
        { 'author.lastName': 'wood' }
      ]
}

...in the data below, one would expect all three results to be equally scored because they all have woodhouse in author.lastName. However, the current scoring method for logical OR only uses the score of the FIRST fuzzy match found from the list of statements in the OR query, not the BEST match. This makes it so that the second item in the list below gets a worse score than the others because Wooster is the first fuzzy match found based on the order of the statements in the OR.

  {
     "title": "Right Ho Jeeves",
     "author": {
        "firstName": "P.D",
        "lastName": "Woodhouse"
     }
  },
  {
     "title": "The Code of the Wooster",
     "author": {
        "firstName": "P.D",
        "lastName": "Woodhouse"
     }
  },
  {
     "title": "Thank You Jeeves",
     "author": {
        "firstName": "P.D",
        "lastName": "Woodhouse"
     }
  }

I have fixed this by choosing the match with the best score from the statements of the logical OR.

Edit:
Until @krisk reviews, my fork with this fix in master can be found here https://github.com/comake/Fuse


describe('When searching for the term "wood"', () => {
test('we get the top three results all with an exact match from their author.lastName', () => {
expect(idx(result.slice(0,3)).sort()).toMatchObject([3,4,5])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the order be 4,3,5? I am imagining it would have to match the following:

{
    item: {
    title: 'The Code of the Wooster',
    author: { firstName: 'P.D', lastName: 'Woodhouse' }
    },
    refIndex: 4
},
{
    item: {
    title: 'Right Ho Jeeves',
    author: { firstName: 'P.D', lastName: 'Woodhouse' }
    },
    refIndex: 3
},
{
    item: {
    title: 'Thank You Jeeves',
    author: { firstName: 'P.D', lastName: 'Woodhouse' }
    },
    refIndex: 5
}

But, I think the reason the order is 3,4,5 is because of line 185: res = [...result]. Did you mean to do res.push(...result)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krisk I suppose that would be a design decision. Do we want to include all matches from all fields in the OR which have a match in the score, or do we want to only score based on the best match out of all the fields scanned by the OR.

Happy to switch to res.push(...result) as I feel that would be the better design decision (include scores from all matches in logical OR).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wavered on this for a while. I do think including scores for all matches is ideal.

score results in a logical OR query by including
all matches from all terms of the OR instead
of the best match or first match
@adlerfaulkner
Copy link
Author

@krisk Updated to incorporate all matches in a logical OR into the score in 1e420fd

krisk added a commit that referenced this pull request Dec 23, 2021
@krisk krisk closed this in #604 Dec 23, 2021
krisk added a commit that referenced this pull request Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants