Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break search plugin out into separate package #3698

Open
waylan opened this issue May 7, 2024 · 8 comments
Open

Break search plugin out into separate package #3698

waylan opened this issue May 7, 2024 · 8 comments
Labels

Comments

@waylan
Copy link
Member

waylan commented May 7, 2024

This is a proposal to break the search plugin out into a separate repo and package.

Why?

There are a few reasons for this proposal:

  1. Like with themes, this decouples the plugin from the core functionality. See Decouple themes from core functionality #3636.
  2. The plugin will no longer be tied to the core's release cycle.
  3. The plugin as it is today is not long for this world and this would be the first step in working toward a replacement. In short, the current plugin relies on lunr.js for the actual search functionality. However lunr.js appears to be stagnate with the last commit being 4 years old despite having many known unresolved issues and bugs. For a more complete analysis of the issues see Towards better documentation search  squidfunk/mkdocs-material#6307.
  4. Removing the "blessed" status of a built-in plugin will make room for and likely encourage third party plugins to provide alternate solutions. In fact, as a separate package, users could choose to uninstall the plugin which would remove potential conflicts with third-party plugins.

The Proposal

I propose moving the existing plugin as-is to a new repo mkdocs/lunr-search and distributing it as the Python package mkdocs-lunr-search. At least initially, MkDocs can list the plugin as a dependency and it will be installed by default.

I also propose registering the plugin under two names for use within the config: the existing search as well as a new name: lunr_search (in other words register 2 entry points which point at the same thing). In this way we can move users away from the generic search name and open up the way for a future replacement; while at the same time, maintaining backward compatibility for existing users.

The plugin documentation would move with the plugin. However, the default behavior (enabled when no plugins specified) would continue to be documented within MkDocs' documentation.

The few existing outstanding bug reports and unmerged PRs can be transferred to the new repo and handled there.

Questions

The fix which was applied in #2591 would no longer work as-is. There are a few possible workarounds:

  1. Instruct users to uninstall the search plugin when using a third-party plugin which conflicts.
  2. Alter the fix in Allow third-party plugins to override core plugins #2591 to specifically apply to the new search package instead (as there are no other plugins in mkdocs.contrib).
  3. Distribute the new package as a sub-package which installs in the old location (mkdocs.contrib.search).
  4. Ignore the issue and encourage third-party plugins to use unique names. Our new name (lunr_search) is setting the example here.

While option 3 is an attractive solution, I would like to see the contrib subdir go away completely. Perhaps option 2 could be a stop-gap until we deprecate the plugin as the default in the future. I personally like option 4, but it is probably too late for that as #2591 has already been released some time ago.

Do we have a preference here?

The Future

Starting with this comment @squidfunk has suggested that he intends to break out the search plugin from Material for MkDocs. Presumably, when that is done it could be considered as a new default plugin. However, having the current built-in plugin already available as a separate package will allow us to continue to support users' existing custom configs which may not work with the new plugin.

Perhaps at some point the search plugin name can be deprecated and only the lunr_search name will be available from the mkdocs.yml config file.

Finally, as the lunr.js library is stagnate, I would expect the MkDocs plugin to be in maintenance mode. In other words, we would not be adding any new features. But we would apply bug fixes and keep it working with new releases of MkDocs and Python, etc.

@trel
Copy link
Contributor

trel commented May 7, 2024

This all sounds like the right direction.

I propose considering search_lunr rather than lunr_search (and therefore, mkdocs_search_lunr) so that when additional plugins appear in the namespace... they all hang together.

@squidfunk
Copy link
Sponsor Contributor

@waylan Thanks for opening this discussion! It's really great to see progress on the core of MkDocs, and I second that this will further allow to keep the focus sharp and also allow for alternate implementations. However, I have some concerns on the proposed directions, with the major concern being:

The fix which was applied in #2591 would no longer work as-is.

While I understand Option 2 and 3 to be deprecation paths that buy us some time, Option 1 and 4 would mandate immediate action from authors. Once MkDocs moves out the search plugin, it would mean that authors would need to make sure that:

  1. Any search plugin that the user wants to use is explicitly listed, as it will not be enabled implicitly, since it's moved outside of core. This is probably how it should've been from the start, because there's regular confusion around why the search doesn't work once authors start adding other plugins. IIRC, this was done because search was once an integral part of core, and then factored out into a plugin, so it was implicitly enabled to reduce disruption of existing installations. It would be a breaking change, since sites that have no plugins setting in mkdocs.yml would need to be touched.

    So this minimal configuration will no longer enable search:

    site_name: My site
    theme:
      name: material
  2. Users of Material for MkDocs would need to replace search with material/search in the list of plugins. Other plugins like blog or social would not be impacted. This might increase confusion around configuration, since one plugin now needs prefixing, while the others don't. Additionally, IIUC, if MkDocs now registers its moved out search plugin under search to stay downward compatible as suggested, it would mean that without changing mkdocs.yml, MkDocs' search plugin will become active which is incompatible with Material for MkDocs and break the build:

    For better illustration, the following configuration will break:

    site_name: My site
    theme:
      name: material
    plugins:
      - search
      - blog

    Necessary changes to the plugins setting:

    site_name: My site
    theme:
      name: material
    plugins:
      - material/search
      - blog

IMHO, this mandates a major release of MkDocs, i.e. v2, unless there's a path that I'm currently not seeing. It would also mandate a major release of Material for MkDocs, which would then update to MkDocs 2.0 and require the aforementioned changes in configuration.

Starting with #3560 (comment) @squidfunk has suggested that he intends to break out the search plugin from Material for MkDocs. Presumably, when that is done it could be considered as a new default plugin. However, having the current built-in plugin already available as a separate package will allow us to continue to support users' existing custom configs which may not work with the new plugin.

Thanks for mentioning this. Here's the comment #3560 (comment) that is being referred:

Also note that we are considering moving the search plugin out of Material for MkDocs in the mid-term, and providing a unified search plugin for all themes with a nice UI and UX. This would however be a second step ☺️

As stated, moving the search out is a mid-term goal. It will require a lot more work to make it work with existing themes, and I'm interested in investing the time into this, but at first, we will keep the search plugin as part of Material for MkDocs. The reason for this is the same reason we distribute all plugins with the theme: we can iterate much faster, as we don't need to take other themes into account when making changes. While we try to make plugins compatible with other themes, and eventually move them out of core, the new search is so fundamentally different, that we need some time to be sure that we got it right. You know how it is: the cost of mistakes in architecture will pile up over time 😅 We want to get this right.

The time horizon for moving the search plugin out will be something like "time of arrival in Material for MkDocs + 12 months". If, when the search plugin arrives in Material for MkDocs, somebody would like to factor it out into a separate plugin and maintain it – they're very much invited. I hope that this is understandable from a maintainer's perspective.

@waylan
Copy link
Member Author

waylan commented May 8, 2024

I propose considering search_lunr rather than lunr_search (and therefore, mkdocs_search_lunr) so that when additional plugins appear in the namespace... they all hang together.

Interesting suggestion. I hadn't considered that.

While I understand Option 2 and 3 to be deprecation paths that buy us some time, Option 1 and 4 would mandate immediate action from authors.

Correct. I was trying to be thorough. I would expect that option 2 would be the right path forward here but wanted to get some feedback in case I was missing something.

@tomchristie
Copy link
Member

The related part of this that I'd like to finesse is the core data model.

Given some input...

# Heading

Some text.

```python
print("code code code")
```

## Sub

On with the chat.

Can we answer the question of exactly what data structure do we should be providing, in order to service a search index.

Sections([
    Section(
        title="Heading",
        level=1,
        href="heading",
        text=...,  # Just the markdown. I'd assume this is what we ought to feed to any search tokenizer.
        html=...   # I'd assume we also need to index the html to support rich formatted search results.
    ),
    Section(
        title="Sub",
        level=2,
        href="sub",
        text=...,
        html=...
    ),
])

I'd assume that the above is exactly sufficient(?)

I realize this probably seems like an overly simple question to ask, but I'm looking at the fundamentals, and want to make sure I'm not massively missing something.

This is a proposal to break the search plugin out into a separate repo and package.

@waylan Brilliant, thanks. I'm broadly in favor, although it also intersects with design work I'm doing on the core and it's not clear to me how this all fits together yet / what a smart migration path looks like.

Ideally I'd like mkdocs to get to a position where a default search implementation (non pre-indexed) could be the responsibility of the theme to provide for. (Eg. lunr.js is included & a templated search index page exists) With anything more complex requiring a plugin/secondary build stage.

(Aside: I wouldn't describe lunr as "stagnant". Stable and unchanging is also a positive quality.) ☺️

@squidfunk Seriously excited & interested in your search work, I'm expecting good things are going to happen here.

@waylan
Copy link
Member Author

waylan commented May 9, 2024

Ideally I'd like mkdocs to get to a position where a default search implementation (non pre-indexed) could be the responsibility of the theme to provide for. (Eg. lunr.js is included & a templated search index page exists) With anything more complex requiring a plugin/secondary build stage.

I do not see that as an ideal solution. Even with a pre-existing search lib (like lunr.js), there is significant amount of JS code to write (i.e., in-browser indexing needs to be implemented in a webworker so the page doesn't hang on load). To me it makes more sense for search to be implemented once and usable across different themes, which is what we have now. Of course, that is the approach I took, so I may be bias.

(Aside: I wouldn't describe lunr as "stagnant". Stable and unchanging is also a positive quality.)

Normally, I would agree, but given the number of known bugs which have not been fixed, I don't see it that way in this instance. An oversimplification if the issue is that a new major release was made which significantly changed the way the lib worked and then it was "abandoned" before all the bugs could be worked out.

Can we answer the question of exactly what data structure do we should be providing, in order to service a search index.

This feels like it is getting out-of-scope for this issue, but I'll indulge. 😉

text=..., # Just the markdown. I'd assume this is what we ought to feed to any search tokenizer.

No. Actually, we need the rendered HTML which is then stripped of its tags. As a reminder, via various plugins and Markdown extensions much of the rendered HTML is generated and doesn't exist in the Markdown content. Therefore, to ensure that the generated content is included in the index, we first render it and then strip tags, reducing it to plain text. There is also the issue of text existing in the Markdown which does not appear in the rendered HTML (such as attr_list etc). The search index doesn't understand how to differentiate. So by rendering and stripping tags, we also remove that from the text which is fed to the indexer.

html=... # I'd assume we also need to index the html to support rich formatted search results.

This is not necessary given the approach I describe above. Additionally, at least in the case of lunr.js, the indexer does not have the ability to properly parse and tokenize HTML markup for indexing. That said, by storing the HTML markup in the index (as un-indexed data associated with the indexed plain text), it could be useful for providing better looking search results. In other words, it could be used to provide formatted search results rather than the plain text results we return now. However, that would balloon the size of the data for the search index and I expect a lot of pushback on that.

I've described what lunr.js expects above. But that may not be the only way a search engine could work. I expect some others may have more to say about this.

@squidfunk
Copy link
Sponsor Contributor

@squidfunk Seriously excited & interested in your search work, I'm expecting good things are going to happen here.

Definitely. And not only for MkDocs, but basically for all other static site generators 😉

@tomchristie
Copy link
Member

As a reminder, via various plugins and Markdown extensions much of the rendered HTML is generated and doesn't exist in the Markdown content.

Ah okay. I'm wondering what contentful cases there are here, and coming up with "template inclusion" style extensions. Is that the sort of thing you're referring to here?

@waylan
Copy link
Member Author

waylan commented May 13, 2024

Yes, "template inclusion" style extensions. But also API documentation plugins. In fact, it is the latter that matters to me. I expect search to index the generated text my API documentation, not the pointer which gets parsed to determine what documentation to retrieve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants