Merge in upstream changes from mkdocs-material #338

jbms · 2024-04-11T04:24:07Z

This replaces the modified Sphinx search implementation with the lunr.js-based search implementation from the upstream mkdocs-material theme.

Maintaining the existing custom search was proving to be untenable. Using the upstream search implementation greatly reduces the size of the diff and should make it much easier to incorporate future changes.

The downsides are:

Search index is significantly bigger. However, individual pages no longer need to be fetched to display detailed search result info; consequently, the net effect may not be that large.
The Sphinx search implementation had special support for sphinx object descriptions which was especially helpful for API documentation.

Remaining work before merging this:

Update documentation to describe new features/changes
Fix search index for incremental builds
Better integration of Sphinx "objects" into the lunr.js-based search:
- typing the name of an API symbol often does not give that API symbol as the first result; we will need to fix that.
- search results for Sphinx objects displayed specially (e.g. tagged with "Python function") in the old search implementation. We should add something similar to the new search implementation.

This replaces the modified Sphinx search implementation with the lunr.js-based search implementation from the upstream mkdocs-material theme.

2bndy5 · 2024-04-11T06:48:37Z

I hope you don't mind if I push some commits to your branch. I won't force push anything, so you keep squashing the history as you see fit.

jbms · 2024-04-11T07:04:13Z

I hope you don't mind if I push some commits to your branch. I won't force push anything, so you keep squashing the history as you see fit.

Yes please go ahead!

Maybe there's a more elegant way to approach this, but I'm just hacking this into something passable for CI

2bndy5 · 2024-04-11T07:53:13Z

I won't force push anything

I lied. I screwed up the typing on a commit I didn't run through mypy first...

and add link to upstream docs

add info to doc and mention the `hide-edit-link` metadata for overriding per page

jbms · 2024-04-11T16:36:45Z

I noticed you fixed some lint issues in the search plugin (which I copied unmodified from upstream except for replacing mkdocs with mkdocs_compat) --- unless they are necessary bug fixes it would be better to revert those to make it easier to merge changes in the future.

Similarly we can just check for the HTML builder in our own search extension sphinx_immaterial/search.py rather than modifying the upstream plugin.

probably a relic from the merge script rebase

also ran `npm audit fix` which bumped `node_modules/tar` and `tar` in package-lock

2bndy5 · 2024-04-12T11:01:27Z

I bumped mermaid to 10.7.0 (as is used in upstream). this fixed the mermaid problem and resolves #328.

I also ran npm audit fix which bumped a couple package versions in the lock file (both had tar in the name). I hope that's ok. I can revert the audit if that was unwanted.

jbms · 2024-04-12T17:41:35Z

Bumping dependencies is fine in general, the one tricky thing is that we have to be careful about cssnano/postcss-related deps because newer versions result in OOM (also applies to upstream theme) and I don't yet know of a solution. The only way I got this merge to work was to copy the lockfile from the upstream theme and then run npm install to integrate our additional deps.

2bndy5

Honestly, this works well enough for me as it is now.

I'm going to focus on the feat: upstream issues though.

I haven't fully wrapped my head around the ported search implementation. My main confusion comes from unfamiliarity with lunr.js (or any search implementation for that matter). But I can try to help if this PR gets too stale.

FWIW, I really don't use the search functionality on web sites unless I'm in a hurry or completely lost for more than 3-5 minutes. I typically find the right page and use Ctrl+F instead.

2bndy5 · 2024-04-13T08:15:32Z

A few other things I've noticed (all of which do not bother me):

Enhanced tooltips are not applied to graphviz diagrams. This likely isn't a simple CSS fix that could be added to our _graphviz.scss because the placement of enhanced tooltips seems to be calculated by JS.
The hlist directive throws enhanced tooltip placement off terribly.
The newer navigation.prune feature doesn't seem to have an effect. Although, I thought we were already doing this by default.
The toc object icons are aligned to the top of multi-lined entries, whereas before they were vertically center-aligned in multi-line entries

before after

2bndy5 · 2024-04-13T11:32:53Z

src/templates/partials/source-file.html

+  {% if page.meta.git_revision_date_localized %}
+    {% set updated = page.meta.git_revision_date_localized %}
+  {% elif page.meta.revision_date %}
+    {% set updated = page.meta.revision_date %}


I just noticed that we've been feeding Sphinx' last_updated context into this template.

sphinx-immaterial/sphinx_immaterial/nav_adapt.py

Line 807 in 63a80bb

"meta": {"hide": [], "revision_date": context.get("last_updated"), "meta": []},

But this doesn't actually describe the last revision date. All pages have the last build date used even if the doc wasn't actually altered on the date it was built. This inaccuracy is not caused by the shallow checkout used in CI builds; I'm also seeing the same behavior locally.

From yesterday's RTD build for this PR (in task_lists.html):

This template feature would be better served by #341, but I guess this behavior would suffice since it has been this way. We could instead manually get the info using

time.strftime("%x", os.stats(doc2path(pagename)).st_mtime)

but again, the checkout date is not the same as the last modified date since git checkout doesn't persist file attributes.

Git doesn't store any times at the per-file level, and git checkout / git clone sets the modification time to the time of the checkout (for an existing work tree, only the modified files get updated time stamps). Therefore we have to explicitly query git to find the most recent commit that modified the file as in #341, because the only timestamps stored by git are commit timestamps. For that we will need a non-shallow checkout but that probably isn't too big of a deal in practice.

I ran across a reference in upstream docs that recommends sparce checkouts. But I didn't see it get used in their CI (at least in community edition docs build).

jbms · 2024-04-13T17:56:11Z

Honestly, this works well enough for me as it is now.

I'm going to focus on the feat: upstream issues though.

I haven't fully wrapped my head around the ported search implementation. My main confusion comes from unfamiliarity with lunr.js (or any search implementation for that matter). But I can try to help if this PR gets too stale.

FWIW, I really don't use the search functionality on web sites unless I'm in a hurry or completely lost for more than 3-5 minutes. I typically find the right page and use Ctrl+F instead.

I think the usefulness of search may depend on a lot of factors (including how good the search results are!) but for API documentation in particular a common use case is that you want to find the documentation for a specific symbol, and for that search is very useful. Suppose I want to find out about the tensorstore_demo.IndexDomain class. Compare our old search results:

https://jbms.github.io/sphinx-immaterial/?q=IndexDomain

with the new search results:

https://sphinx-immaterial--338.org.readthedocs.build/en/338/?q=IndexDomain

As for how search works, basically:

the upstream search plugin parses the HTML of each page to split it into sections.
Each section is then treated as a separate "document" for the purpose of searching. All HTML tags except a few that are supported for the "rich search results" are stripped out. Each document has a number of fields in addition to the main text/html content: the currently-used fields are title and tags, I believe. (Tags refers to the upstream tag feature that we don't currently support.) Documents can also have an additional numeric "boost" metadata property that affects how it is ranked. Each section with its fields is then serialized directly as JSON without further processing.
The client javascript reads the json containing all of the sections and builds a lunr.js search index. When doing a search, the results are ranked based on which field the terms matched against --- tags are the highest, then title, then main body. Then the additional boost property is also factored into the rank.
After performing the search, multiple documents from the same page are then grouped together when displaying results.

To improve results I think we should:

Modify the section parsing to ensure sphinx object descriptions are treated as separate sections
Include some metadata in the search documents generated from sphinx object descriptions to indicate the object description type (e.g. "Python class"), as we did before, and find a way to display it nicely (e.g. as text or as some type of icon) in the results. (Kind of optional, but pretty helpful.)
Sphinx has a concept of search weighting for object description types. We should include that as the boost value for the sections corresponding to object descriptions. That allows, for example, object descriptions for individual function parameters to be downweighted in the results.

It is possible we may need other changes to make API documentation searching work well, but those things would be a start.

2bndy5 · 2024-04-13T20:31:04Z

Superb explanation! 🚀 Doing another code dive...

2bndy5 · 2024-04-13T20:53:30Z

Modify the section parsing to ensure sphinx object descriptions are treated as separate sections

Just a gut reaction here: To avoid any deviation from the inherited search plugin, we could instead derive from it and make any API changes that way.

sphinx-immaterial/sphinx_immaterial/search.py

Lines 57 to 58 in 89076eb

    
           def _make_indexer(app: sphinx.application.Sphinx): 
        
               return search_plugin.SearchIndex(**dict(_get_search_config(app)))

To compensate for this complexity, our search.py module could be organized as a sub-package. Also, support for other features (like tags) would then be contained/scoped to search-specific implementations.

jbms · 2024-04-13T21:44:13Z

Modify the section parsing to ensure sphinx object descriptions are treated as separate sections

Just a gut reaction here: To avoid any deviation from the inherited search plugin, we could instead derive from it and make any API changes that way.

sphinx-immaterial/sphinx_immaterial/search.py

Lines 57 to 58 in 89076eb

def _make_indexer(app: sphinx.application.Sphinx):

return search_plugin.SearchIndex(**dict(_get_search_config(app)))

To compensate for this complexity, our search.py module could be organized as a sub-package. Also, support for other features (like tags) would then be contained/scoped to search-specific implementations.

Yeah potentially we could indeed monkey patch some of the classes (e.g. SearchIndex and Parser) rather than modifying the file itself --- once we know what modifications are needed we can figure out the best way to make them.

Note that the upstream search plugin is already used in a hacky way from sphinx_immaterial/search.py since it is intended as a mkdocs plugin, not a sphinx extension. In particular the only portion of the SearchPlugin class itself that we are using is on_config for filling in default values of the config. Then we use the SearchIndex class directly.

2bndy5 · 2024-04-14T05:36:43Z

Still investigating an adequate approach, so this post is just an organization of my thoughts/findings.

Sphinx has a concept of search weighting for object description types. We
should include that as the boost value for the sections corresponding to object descriptions

I found that Sphinx aggregates a prio in search.IndexBuilder.get_objects(). Assuming prio meabs "priority", this prio is returned by <derived-Domain()>.get_objects() (ie here for python & here for cpp). Luckily, this prio is return in a tuple that includes the object ID as used in the HTML elements' id attribute.

To suplement this info into our SearchIndex.add_entry_from_context() results, we actually need to augment upstream's SearchIndex.create_entry_for_section(). Each entry's "boost" field can be set according to the object's id; I'm thinking 1 + (0.5 * prio). However, fetching the object description's ID is not straight forward for our apigen outputs.

For autodoc (& manual invocation of sphinx domain directives), the id is appended to the entry's "location" field (as a URL hash location).

[
  {
    "location": "demo_api.html#test-py-module",
    "title": "<code>test_py_module</code>",
    "text": "omitted for brevity"
  },
]

For apigen, the id may not present in the "location" field because each documented member gets its own page, resolving to just the page URL without hash URI. This case is a stumper (at least as I'm writing this).
```
[
  {
    "location": "python_apigen_generated/IndexDomain.size-aea1f878.html",
    "title": "tensorstore_demo.IndexDomain.size",
    "text": "omitted for brevity"
  },
]
```

It would be nicer to have a 1-to-1 map of HTML-to-doctree to properly & robustly set the boost appropriately for API descriptions.

Initially, I don't think we need to monkeypatch anything to achieve this. It could probably be done in a separate function that _html_page_context() calls. But here's where we have to consider performance. If we can extract the CSS classes of the API description signature, then we can avoid traversing all domains in the builder env and, for example, just focus on py domain if "py" is in the list of classes.

I'm partial to opening another PR that merges into this branch, so we don't have to play around with this branch's git history to keep my proposal's reviews/changes "clean". The remaining goals blocking this PR are not simple objectives.

PS - I also thought about leveraging upstream's tag plugin with Sphinx domains, but I don't see that being very feasible since tags are really specific to an entire page (not sections of a page).

jbms · 2024-04-14T05:47:35Z

I think we can map back to the sphinx object description info from the HTML in a similar way to how the old search_adapt.py does a similar mapping:

sphinx-immaterial/sphinx_immaterial/search_adapt.py

Line 24 in 63a80bb

def _get_all_objects(

We can first iterate over all objects to compute a map of (html_path, anchorname) -> (domain, object_type, any other info) and then use that when processing the HTML. The builder has a method to get the html path from the docname.

jbms · 2024-04-14T05:52:48Z

Note that most object descriptions do not result in a section based on the current section parsing logic in the search plugin --- we would have to change the parsing, or modify the HTML before feeding it to the parser, to fix that. With apigen we do get a section because there is a separate page, but I think otherwise we don't.

2bndy5 · 2024-04-14T09:48:39Z

Just as a quick hack, I added "dl" to the Parser.keep attr, and I get much better API results. 😆 They still need to be boosted though.

jbms requested a review from 2bndy5 April 11, 2024 04:24

jbms force-pushed the merge-upstream branch 3 times, most recently from 7b53c8b to 593856d Compare April 11, 2024 05:49

This comment was marked as resolved.

Sign in to view

jbms force-pushed the merge-upstream branch from 593856d to 0ef5fc7 Compare April 11, 2024 06:06

This comment was marked as outdated.

Sign in to view

jbms force-pushed the merge-upstream branch from 0ef5fc7 to 0550769 Compare April 11, 2024 06:11

Merge in upstream changes from mkdocs-material

076acb1

This replaces the modified Sphinx search implementation with the lunr.js-based search implementation from the upstream mkdocs-material theme.

jbms force-pushed the merge-upstream branch from 0550769 to 076acb1 Compare April 11, 2024 06:13

This comment was marked as resolved.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as resolved.

Sign in to view

2bndy5 added 2 commits April 10, 2024 23:32

add missing __init__.py

fc28efc

doc auto color palette option and use it :)

2b1cdc7

re-enable code copy button & add doc link about related feature

a8d5819

let search plugin fail silently for non-html builders

55e560f

Maybe there's a more elegant way to approach this, but I'm just hacking this into something passable for CI

2bndy5 force-pushed the merge-upstream branch from fd2bf6a to 55e560f Compare April 11, 2024 07:49

2bndy5 added 3 commits April 11, 2024 00:57

remove obsoleted docs/CSS about customizing code annotations

c8afc20

update announcement banner

f8d5df3

add navigation.footer to docs conf.py features

7844857

and add link to upstream docs

This comment was marked as resolved.

Sign in to view

2bndy5 added 2 commits April 11, 2024 02:30

(re-)enable "edit this page" & "view this src" links

d2a5db4

add info to doc and mention the `hide-edit-link` metadata for overriding per page

fix admonitions specific to sphinx and docutils

5c6a528

This comment was marked as resolved.

Sign in to view

2bndy5 added 3 commits April 11, 2024 18:59

add info about annotations for block-level elements

f5c9971

admonish copy/no-copy class for code block's copy button

a42c08d

doc content.tooltips feature and enable it in conf.py

a221313

This comment was marked as resolved.

Sign in to view

fix warning about antiquated JS snippet

8bfa816

probably a relic from the merge script rebase

This comment was marked as resolved.

Sign in to view

This was linked to issues Apr 12, 2024

Broken link to fontawesome/brands icons #337

Open

Feature request: Mermaid js 10.x #328

Open

2bndy5 removed a link to an issue Apr 12, 2024

Feature request: Mermaid js 10.x #328

Open

bump mermaid to 10.7.0 (resolves #328)

67f8859

also ran `npm audit fix` which bumped `node_modules/tar` and `tar` in package-lock

2bndy5 linked an issue Apr 12, 2024 that may be closed by this pull request

Feature request: Mermaid js 10.x #328

Open

update mermaid doc

34f81ff

reviewed my other commits

89076eb

2bndy5 force-pushed the merge-upstream branch from 4def3b7 to 89076eb Compare April 12, 2024 22:06

2bndy5 approved these changes Apr 13, 2024

View reviewed changes

2bndy5 reviewed Apr 13, 2024

View reviewed changes

This was referenced May 3, 2024

Language switch to currently opened page #349

Open

Fix search term match and highlight for Unicode chars #347

Closed

Anchoring a content tab possible? #350

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge in upstream changes from mkdocs-material #338

Merge in upstream changes from mkdocs-material #338

jbms commented Apr 11, 2024 •

edited by 2bndy5

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as off-topic.

This comment was marked as resolved.

This comment was marked as resolved.

2bndy5 commented Apr 11, 2024

jbms commented Apr 11, 2024

2bndy5 commented Apr 11, 2024

This comment was marked as resolved.

jbms commented Apr 11, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

2bndy5 commented Apr 12, 2024

jbms commented Apr 12, 2024

2bndy5 left a comment

2bndy5 commented Apr 13, 2024 •

edited

2bndy5 Apr 13, 2024 •

edited

jbms Apr 13, 2024

2bndy5 Apr 13, 2024

jbms commented Apr 13, 2024

2bndy5 commented Apr 13, 2024

2bndy5 commented Apr 13, 2024 •

edited

jbms commented Apr 13, 2024

2bndy5 commented Apr 14, 2024

jbms commented Apr 14, 2024

jbms commented Apr 14, 2024

2bndy5 commented Apr 14, 2024

Merge in upstream changes from mkdocs-material #338

Are you sure you want to change the base?

Merge in upstream changes from mkdocs-material #338

Conversation

jbms commented Apr 11, 2024 • edited by 2bndy5

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as off-topic.

This comment was marked as resolved.

This comment was marked as resolved.

2bndy5 commented Apr 11, 2024

jbms commented Apr 11, 2024

2bndy5 commented Apr 11, 2024

This comment was marked as resolved.

jbms commented Apr 11, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

2bndy5 commented Apr 12, 2024

jbms commented Apr 12, 2024

2bndy5 left a comment

Choose a reason for hiding this comment

2bndy5 commented Apr 13, 2024 • edited

2bndy5 Apr 13, 2024 • edited

Choose a reason for hiding this comment

jbms Apr 13, 2024

Choose a reason for hiding this comment

2bndy5 Apr 13, 2024

Choose a reason for hiding this comment

jbms commented Apr 13, 2024

2bndy5 commented Apr 13, 2024

2bndy5 commented Apr 13, 2024 • edited

jbms commented Apr 13, 2024

2bndy5 commented Apr 14, 2024

jbms commented Apr 14, 2024

jbms commented Apr 14, 2024

2bndy5 commented Apr 14, 2024

jbms commented Apr 11, 2024 •

edited by 2bndy5

2bndy5 commented Apr 13, 2024 •

edited

2bndy5 Apr 13, 2024 •

edited

2bndy5 commented Apr 13, 2024 •

edited