Skip to content

Latest commit

 

History

History
62 lines (43 loc) · 4.07 KB

RELEVANCE.md

File metadata and controls

62 lines (43 loc) · 4.07 KB

Search Relevance

Overview

In a search engine, the relevance is the measure of the relationship accuracy between the search query and the search result. Higher the relevance is, the higher is the quality of search result and the users are able to get more relevant content. This project aims to add plugins to OpenSearch to help users make their query results more accurate, contextual and relevant.

Relevancy and OpenSearch

Today, OpenSearch provides results in the order of scores generated by algorithms matching the indexed document contents to the input query. The documents that are more relevant to the query are ranked higher than the ones that are less relevant. These rankings may make sense to one set of users/applications and for others it may be very irrelevant. For example, relevancy for an E-commerce company can mean more similar products in the same category of the search query. While for a document search relevancy may mean, searching the query across different topics/categories present in the document store. This is why, we need more ways to customize the results and its rankings as per the need of the user/business.

Relevancy Engineering

Relevancy as a problem, can’t just be solved at the search layer. Improving relevancy should be envisioned holistically from understanding the ingested data and usage signals to extracting feature, adding re-writers and improving algorithms. Below is the architecture of OpenSearch Relevancy Engineering.

[Initially presented at Haystack 2022 by @anirudha , @JohannesDaniel and @ps48].

Overall the Relevancy Engineering can be divided into two tiers:
  1. Ingestion Tier: This tier handles getting the data from different sources to OpenSearch. This data may include:
    1. Search Data:
      1. Core search data, that needs to be queried on by OpenSearch
      2. Ingestion connectors to fetch the data from different data sources and sink in OpenSearch indices.
    2. Search Management Data:
      1. Adding rules and judgements to the rewriter indices.
    3. Observability Data:
      1. Adding customer usage signals to OpenSearch, these signals may include granular details like anonymized customer queries, clicks, orders and session details.
  2. Search & Relevancy Platform Tier: This tier is responsible for analytics, re-wrtiers, model improvements and adding search configurations.
    1. Search Analytics & Discovery:
      1. Dashboards for analytics, metrics for search tests, search UIs and query profiling.
    2. Querqy based query Rewriting:
      1. Rewriters to customize queries with synonyms, word-breaks, spell corrections, query relaxation.
    3. Search Back Office:
      1. Manage business rules, ontologies and manual judgments.
    4. Relevancy workbench:
      1. Improve algorithms with automated testing, relevance model trainings, personalizations and custom re-rankers.

Contributing

See developer guide and how to contribute to this project.

Getting Help

If you find a bug, or have a feature request, please don't hesitate to open an issue in this repository.

For more information, see project website and documentation. If you need help and are unsure where to open an issue, try forums.

Security

If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our vulnerability reporting page. Please do not create a public GitHub issue.

License

This project is licensed under the Apache v2.0 License.

Copyright

Copyright OpenSearch Contributors. See NOTICE for details.