Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Search: Fix partial matches overwriting full matches #11958

Merged
merged 3 commits into from
Mar 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Features added
Bugs fixed
----------

* #11958: HTML Search: Fix partial matches overwriting full matches.
Patch by William Lachance.
* #11944: Use anchor in search preview.
Patch by Will Lachance.
* #11668: Raise a useful error when ``theme.conf`` is missing.
Expand Down
20 changes: 12 additions & 8 deletions sphinx/themes/basic/static/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -477,14 +477,18 @@ const Search = {
// add support for partial matches
if (word.length > 2) {
const escapedWord = _escapeRegExp(word);
Object.keys(terms).forEach((term) => {
if (term.match(escapedWord) && !terms[word])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

terms[word] will be falsey (0) for the first document.

Copy link
Contributor

@jayaddison jayaddison Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlach based on index-format findings (and test coverage), should this be rephrased to:

terms[word] will be falsey (0) for a word that only exists in document zero

(kinda wordy, but captures the fact that [0, 1], for example, is not falsy)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited with a correction and for conciseness)

arr.push({ files: terms[term], score: Scorer.partialTerm });
});
Object.keys(titleTerms).forEach((term) => {
if (term.match(escapedWord) && !titleTerms[word])
arr.push({ files: titleTerms[word], score: Scorer.partialTitle });
});
if (!terms.hasOwnProperty(word)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is marginally more efficient to do this check outside the loop too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the performance improvement is significant, I think it'd be better to leave the conditional check after the && clause.

(that's based on considering that if the first condition fails, then the clause after the && isn't evaluated. if moving the property-check eliminates a lot of regex matches, then fair enough, but this seems intended to be primarily a correctness fixup, so I'd lean towards making the logical diff as small as possible)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to push back on this a bit: I think this small improvement is in scope (since we're changing the logic anyway) and the diff here is relatively small and easy to understand. Moving the property check should indeed reduce the number of regex matches a fair bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, no problem. I think we'll have to agree to disagree on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if you could avoid the extraction of an outer if statement here. When viewing the diff, the larger set of line changes obscure what the bugfix is; and that's based on my assessment today, after having understood the fix a couple of weeks ago. It'd be trickier for someone who wasn't familiar with the original pull request, or looking at it again after a longer duration of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understood your point, I just don't agree with it in this case. I have more to say on this, but I think we've gone out of scope of a review. If you want to talk about it more, feel free to email me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. From my perspective, bug report, code review and related discussion are often useful information when trying to understand a commit/change, so I'd prefer to keep the discussion public; in this case I do think that the size of the line diff is relevant to code review.

Would you be OK with me drafting an alternative PR? I don't know if it'd be more likely to be accepted for merge, and it's OK if it isn't, but I'd like to present it as an alternative.

Copy link
Contributor Author

@wlach wlach Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. From my perspective, bug report, code review and related discussion are often useful information when trying to understand a commit/change, so I'd prefer to keep the discussion public; in this case I do think that the size of the line diff is relevant to code review.

I don't think I have more to say on the technical front. But just to state it one more time so it's clear: I think this (small) PR as it stands fixes the issue in a more fundamental way and improves performance so it's my strong recommendation that it be the final state of the code. I don't feel the benefit of breaking it up into two PRs is worth the hassle and churn (leaving aside that this discussion has now occupied more effort than that would entail...).

Would you be OK with me drafting an alternative PR? I don't know if it'd be more likely to be accepted for merge, and it's OK if it isn't, but I'd like to present it as an alternative.

It's not clear to me when someone who can actually merge the code will look at these changes. I can't stop you from proposing an alternative, but I think the best thing is probably just to wait for a maintainer to break our impasse. I am willing to make the change you suggest if it will actually get this merged and my PR is open to updates from the owners of this repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after finding more time to understand the code, I concede that the performance improvements here are important. I've opened #12045 with a description of what I believe the problem is.

Also, compared to my suggested alternative #12040, I think that the use of the JavaScript hasOwnProperty function is probably better/safer code style, making this pull request preferable.

@wlach could you edit the description of this pull request (#11958) to add 'Resolves #12045'?

(and thanks for your patience!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother much about this. Here, it's more important to use hasOwnProperty instead of !terms[word] which could lead to incorrect boolean evaluations (and personally if we can have an easy-to-write improvement because we can isolate a condition from outside a loop, it's fine for me).

Object.keys(terms).forEach((term) => {
if (term.match(escapedWord))
arr.push({ files: terms[term], score: Scorer.partialTerm });
});
}
if (!titleTerms.hasOwnProperty(word)) {
Object.keys(titleTerms).forEach((term) => {
if (term.match(escapedWord))
arr.push({ files: titleTerms[word], score: Scorer.partialTitle });

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the access to titleTerms[word] here a (separate) bug?

I think #11957 as described is broad enough to describe both. It's the same error: a partial match overwriting a full one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I see what you mean now given https://github.com/sphinx-doc/sphinx/pull/11958/files#r1509999419. Yeah, this looks like a separate bug strictly speaking. This is such a rabbit hole. 😭

I'd be inclined to fix it in a new PR with its own test (maybe depending on this one).

Copy link
Contributor

@jayaddison jayaddison Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, no problem, there's a lot going on here :) As mentioned though, I feel good about the effect on quality that all this will have.

Agreed; I'll create a separate bugreport for this, and then request review for #12037 after this one is merged.

Could you clarify in the description of #11957 that it relates solely to terms that only occur in the zeroth document in the index? (if my understanding of that is true?)

Edit: fix pr/issues references

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg, sorry - those linked issues were wrong; I meant to request a change for the description of #11957 - to make the nature of the bug more precise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reported separately as #12040.

This comment was marked as outdated.

});
}
}

// no match but word was a required one
Expand Down
2 changes: 1 addition & 1 deletion tests/js/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ describe('Basic html theme search', function() {
"<no title>",
"",
null,
2,
5,
Copy link
Contributor Author

@wlach wlach Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, the score gets boosted to 5 (what it should be).

"index.rst"
]];
expect(Search.performTermsSearch(searchterms, excluded, terms, titleterms)).toEqual(hits);
Expand Down