- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
searchindex: non-main index entries scored too high #11578
Comments
This code fails to preserve whether an entry is 'main' or not. It looks like how index entries are stored needs to be reworked. @AA-Turner, maybe you can take a look? |
+1 for this. One approach is as follows:
I'm not sure whether this would break search engines of large projects. This would also slow down the search index generation but not that much I think (it's just doing twice the freeze step). |
Does someone have an example of a non-main index entry that is indeed relevant to a corresponding search and should be included? Search results that are only tangentially relevant can be a source of noise/distraction, so I'd like to check whether it's worthwhile to include non-main entries in the first place. |
I'd be fine with leaving non-main index entries out. If someone is really looking for all the mentions of an indexed entity, they can look at the index instead of using search. |
What I really want to be sure about is that it does not cause any regression for the projects involved in the original issue (especially the Python doc). |
#11695 gives non-main index entries lower scores. This should order them much later, similar to the proposal in #11578 (comment), but without risk of dropping entries for existing projects. |
#11696 splits non-main index entries into their own group that is placed after all other results. |
Describe the bug
Since #10819, index entries are returned as search results. When the query string exactly matches an indexed term, all index entries become search results with score 100. This places them above most other search results.
Reporting "main" index entries early in the search results makes sense. However, non-main index entries are often just arbitrary cross-references to the indexed term. Currently these receive the same score as main index entries and are therefore ordered early in search results. This obscures the main entries and other more-important matches such as document titles.
This problem affects CMake's documentation, and is tracked in CMake Issue 25175. Here are some examples showing a search in CMake 3.27.1's documentation.
Using Sphinx 5.3.0 (click to expand screenshot):
Using Sphinx 5.1.1 (click to expand screenshot):
Using Sphinx 5.3.0 plus a patch that removes non-main index entries (click to expand screenshot):
The patch is for demonstration in this issue and is not a proposed fix:
How to Reproduce
This is difficult to reproduce in a small example because it requires a large number of documents to demonstrate non-trivial search results.
One can reproduce the problem by building CMake 3.27.1's documentation like this:
This should show results similar to the description's screenshots.
Environment Information
The text was updated successfully, but these errors were encountered: