Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search queries require all terms to match, or nothing is returned #10331

Open
MatMoore opened this issue Apr 18, 2024 · 1 comment
Open

Search queries require all terms to match, or nothing is returned #10331

MatMoore opened this issue Apr 18, 2024 · 1 comment
Labels
bug Bug report stale

Comments

@MatMoore
Copy link

MatMoore commented Apr 18, 2024

Describe the bug
If users enter search queries with multiple terms, the query must be extremely precise to return results. Datahub will not return matches unless all of the search terms are present.

To Reproduce
Steps to reproduce the behavior:

  1. Pick any entity in the catalogue
  2. Copy and paste some words from its description into search - it should show up in the search results
  3. Add or change a single term to something that doesn't match and then repeat the search - now nothing will be returned

A contrived example on the demo instance: This table has basic information about a customer, as well as some derived facts based on a customer's orders vs "This table has simple information about a customer, as well as some derived facts based on a customer's orders"

The behaviour is the same in both the React frontend and in the GraphQL API.

Expected behavior
Providing that quotes are not used around the search term, I would expect that only one term needs to match for an entity to be returned in the search results. Entities that match some but not all terms should be ranked lower but not excluded from the result set.

This is likely to be particularly problematic for users who are less sure of what they are looking for, tend towards natural language queries.

In our use case we are hoping to roll out the catalogue to a very diverse set of users, and there will be some user groups who work less closely with the data. These users would be impacted a lot if the search has low recall.

Desktop (please complete the following information):

  • OS: MacOS
  • Browser: Chrome
  • Version: Tested in versions 0.13.1, 0.12.0

Additional context
Datahub has an exactMatch config setting, but this is defaulted to false, so this doesn't explain why we are seeing this exclusive behaviour.

      ## Configuration around exact matching for search
      exactMatch:
        ## if false will only apply weights, if true will exclude non-exact
        exclusive: false

This is also not part of Elasticsearch's simple query string:

For example, a query string of capital of Hungary is interpreted as capital
OR of OR Hungary.

Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report stale
Projects
None yet
Development

No branches or pull requests

1 participant