HTML search results: identical summaries displayed for different link targets #11943

jayaddison · 2024-02-04T20:04:16Z

Describe the bug

My understanding (that could be mistaken) about the html_show_search_summary feature (enabled by default) is that each search result displayed to the user should have a relevant snippet of the content displayed alongside, to aid the user's determination of which result(s) are most relevant.

When the feature is enabled, we make an HTTP request to each result and display a portion of that using JavaScript.

How to Reproduce

Build the Sphinx documentation locally and then run a basic webserver to host it:

sphinx.git $ sphinx-build -b html doc _build 
sphinx.git $ cd _build
sphinx.git $ python -m http.server -b 127.0.0.1

Open http://127.0.0.1:8000 in a web browser, and perform a search for the query term test.

Observe that multiple results with separate HTML anchor targets appear in the search results. However: they display the same search snippet (not useful - this is the bug).

Environment Information

Platform:              linux; (Linux-6.6.13-amd64-x86_64-with-glibc2.37)
Python version:        3.11.7 (main, Dec  8 2023, 14:22:46) [GCC 13.2.0])
Python implementation: CPython
Sphinx version:        7.2.6
Docutils version:      0.20.1
Jinja2 version:        3.1.3
Pygments version:      2.17.2

Sphinx extensions

No response

Additional context

Discovered during investigation of #11942.

The text was updated successfully, but these errors were encountered:

jayaddison · 2024-02-04T20:08:48Z

Here's the code for the relevant helper method:

sphinx/sphinx/themes/basic/static/searchtools.js

Lines 547 to 572 in ceb3b2a

    
             /** 
        
              * helper function to return a node containing the 
        
              * search summary for a given text. keywords is a list 
        
              * of stemmed words. 
        
              */ 
        
             makeSearchSummary: (htmlText, keywords) => { 
        
               const text = Search.htmlToText(htmlText); 
        
               if (text === "") return null; 
        
               const textLower = text.toLowerCase(); 
        
               const actualStartPosition = [...keywords] 
        
                 .map((k) => textLower.indexOf(k.toLowerCase())) 
        
                 .filter((i) => i > -1) 
        
                 .slice(-1)[0]; 
        
               const startWithContext = Math.max(actualStartPosition - 120, 0); 
        
               const top = startWithContext === 0 ? "" : "..."; 
        
               const tail = startWithContext + 240 < text.length ? "..." : ""; 
        
               let summary = document.createElement("p"); 
        
               summary.classList.add("context"); 
        
               summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; 
        
               return summary; 
        
             }, 
        
           };

jayaddison · 2024-02-04T20:16:23Z

I had optimistically hoped that this issue might be a regression somewhere in the makeSearchSummary code (for example, that anchors were no longer identified correctly). That doesn't appear to be the case; we don't attempt to identify anchor elements in the returned result-page-HTML data currently.

That could make sense; parsing the HTML of multiple results pages client-side could be an expensive operation. But it does mean that locating the anchors -- and relevant text near them -- would be required, and could require some care.

I think I would pause at this point and check for advice from more experienced Sphinx developers.

wlach · 2024-02-04T21:13:29Z

That could make sense; parsing the HTML of multiple results pages client-side could be an expensive operation. But it does mean that locating the anchors -- and relevant text near them -- would be required, and could require some care.

We do this already, so I think it's fine (the amount of computation/time needed to parse out the HTML is almost always going to be overwhelmed by the latency in requesting the HTML). See #11944 for what I think is the right solution.

jayaddison · 2024-02-04T21:34:16Z

That could make sense; parsing the HTML of multiple results pages client-side could be an expensive operation. But it does mean that locating the anchors -- and relevant text near them -- would be required, and could require some care.

We do this already, so I think it's fine (the amount of computation/time needed to parse out the HTML is almost always going to be overwhelmed by the latency in requesting the HTML). See #11944 for what I think is the right solution.

Brilliant - I didn't realize that we already did that. I'll add a question about the content we retrieve relative to each anchor in that pull request.

jayaddison added type:bug html search labels Feb 4, 2024

jayaddison mentioned this issue Feb 4, 2024

HTML Search: Fix duplicate results #11942

Closed

wlach mentioned this issue Feb 4, 2024

HTML Search: Use anchor for search preview #11944

Merged

wlach mentioned this issue Feb 6, 2024

No highlighting on search result page #11955

Closed

picnixz closed this as completed in #11944 Feb 27, 2024

github-actions bot locked as resolved and limited conversation to collaborators Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML search results: identical summaries displayed for different link targets #11943

HTML search results: identical summaries displayed for different link targets #11943

jayaddison commented Feb 4, 2024

jayaddison commented Feb 4, 2024

jayaddison commented Feb 4, 2024

wlach commented Feb 4, 2024

jayaddison commented Feb 4, 2024

HTML search results: identical summaries displayed for different link targets #11943

HTML search results: identical summaries displayed for different link targets #11943

Comments

jayaddison commented Feb 4, 2024

Describe the bug

How to Reproduce

Environment Information

Sphinx extensions

Additional context

jayaddison commented Feb 4, 2024

jayaddison commented Feb 4, 2024

wlach commented Feb 4, 2024

jayaddison commented Feb 4, 2024