Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't override title search boost value for elasticsearch #11929

Open
krukas opened this issue May 6, 2024 · 5 comments
Open

Can't override title search boost value for elasticsearch #11929

krukas opened this issue May 6, 2024 · 5 comments
Labels

Comments

@krukas
Copy link
Contributor

krukas commented May 6, 2024

Issue Summary

I have created a page model with search_fields with SearchField and AutoCompleteField with different boost value. When index is created and looking at the index settings in Elasticsearch, title still uses a boost value of 2.

Steps to Reproduce

  1. Create Page model with custom search_fields with different boost value for title.
  2. update index
  3. Look at Elasticsearch index and see that boost value 2 is still used

Technical details

  • Python version: 3.9.19.
  • Django version: 4.2.11
  • Wagtail version: 5.2.4

Working on this

This happens because index settings are applied for every model. The order of the models applied is based on installed apps: https://github.com/wagtail/wagtail/blob/main/wagtail/search/index.py#L130. Now the last model will win and override the settings. And installed apps are mostly in order of own apps first and then the rest. Wagtail core Page model will always override the index settings.

I think to emulate how Django templates works, we can give get_indexed_models in reversed so that top apps are the last one to apply settings.

@krukas krukas added status:Unconfirmed Issue, usually a bug, that has not yet been validated as a confirmed problem. type:Bug labels May 6, 2024
krukas added a commit to krukas/wagtail that referenced this issue May 6, 2024
@gasman
Copy link
Collaborator

gasman commented May 6, 2024

Thanks for the report @krukas!

Look at Elasticsearch index and see that boost value 2 is still used

Can you give more details about the exact thing you're looking at here, please? If I remember correctly, the Elasticsearch backend creates separate indexes for the base wagtailcore.Page model and the specific page model, and boosts are applied at querying time, so it may well be that BlogPage.objects.search("some term") respects the custom boost value on title but Page.objects.search("some term") does not.

Also, please can you test this against the latest Wagtail version and confirm whether the issue still exists? There were some significant changes to the boosting logic in Wagtail 6.0, and these may have already fixed this.

@krukas
Copy link
Contributor Author

krukas commented May 6, 2024

@gasman I'm looking in the index settings in Elasticsearch at the copy_to for title, this remains _all_text_boost_2_0. Also I have only on index wagtailcore_page that has all the fields from all page models in the format <app>_<model>__<field>

All pages extends from a abstract BasePage that extends from wagtail.models.Page

@krukas
Copy link
Contributor Author

krukas commented May 7, 2024

I have tested with 6.0.3 and have the same results.

@gasman gasman removed status:Needs Info status:Unconfirmed Issue, usually a bug, that has not yet been validated as a confirmed problem. labels May 7, 2024
@gasman
Copy link
Collaborator

gasman commented May 7, 2024

Thanks for clarifying! I see what's going on here now - the indexing code is generating a mapping configuration for every subclass of Page in turn, and pushing that to the wagtail__wagtailcore_page index, and apparently Elasticsearch handles that by merging it into the previous configuration.

That works fine if the definitions for the core fields such as title are identical across all subclasses - the mappings for those fields will be unchanged by the merge, and it will just add the fields from the specific model, appropriately namespaced (e.g. blog_blogpage__body). However, if the core fields are modified (e.g. by adding a custom boost value on title), the model which is pushed last will "win". That's fine if the modified definition is part of a common BasePage shared by all classes, but it has unwanted consequences in other cases - for example, if the last model in the list is BlogPage with SearchField("title", boost=10), then all titles will be boosted by 10, not just BlogPage instances.

I can't see a good way to solve this - we would need to configure the index in such a way that the title field is copied to _all_text_boost_10_0 when pushing a BlogPage, and copied to _all_text_boost_2_0 for all other page types. We could do that by avoiding Elasticsearch's copy_to feature entirely, and doing the copying on the Python side instead (so that Elasticsearch is passed a JSON document with the appropriate _all_text_boost_... field filled in) - however, that wouldn't handle searches with explicit fields, e.g. Page.objects.search("foo", fields=["title"]), which don't use the _all_text fields.

@krukas
Copy link
Contributor Author

krukas commented May 8, 2024

My solution would indeed not work if you have multiple boost values for the same field name. I Think the move from copy_to to python side is probably the best solution. I don't know the whole working of Wagtail search, if we fill all the _all_text fields in python instead of the copy_to. Nothing has changed on the available index fields in Elasticsearch and everything should still work the same?

For us all the page title should have the same boost value, for now I use this hack/fix to change it:

# In Django apps ready function
from wagtail.models import Page
Page.search_fields[0].boost = 4  # SearchField
Page.search_fields[1].boost = 4  # AutocompleteField

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants