Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting giving incorrect results #30691

Open
nehajatav opened this issue Mar 20, 2024 · 7 comments
Open

Sorting giving incorrect results #30691

nehajatav opened this issue Mar 20, 2024 · 7 comments
Assignees
Projects
Milestone

Comments

@nehajatav
Copy link

Describe the bug
Query1: The below gives 4000 lexid sorted by id field as expected
{ "hits" : 0, "model.searchPath" : "/0", "yql" : "select '[docid]' from sources * where !(range( myDate_t, -Infinity, Infinity )) AND (range(date,960892397000,1085589357000)) order by '[docid]' asc limit 4000 offset 0", "timeout" : "120s" }
Query2: The below gives 3000 lexid sorted by id field
{ "hits" : 0, "model.searchPath" : "/0", "yql" : "select '[docid]' from sources * where !(range( myDate_t, -Infinity, Infinity )) AND (range(date,960892397000,1085589357000)) order by '[docid]' asc limit 3000 offset 0", "timeout" : "120s" }
Below are the observations that we dont expect with default top-k-probability

  1. id ranked 1-2855 from output of Query1 are ranked same in Query2
  2. id ranked 2896-3027 from output of Query1 are ranked 2856-2887 in Query2
  3. id ranked 3082 from output of Query1 are ranked 2888 in Query2
  4. id ranked 3096-3107 from output of Query1 are ranked 2889-3000 in Query2

Expected behavior
Ranking should nearly be the same for both queries

Environment (please complete the following information):

  • Rhel8
  • Podman

Vespa version
8.221.29

@frodelu frodelu added this to the soon milestone Mar 20, 2024
@bratseth
Copy link
Member

bratseth commented May 7, 2024

Is this repeatable? Is coverage 100% in both cases?
Could you try with top-k-probability set to 1.0?

@nehajatav
Copy link
Author

nehajatav commented May 7, 2024

How do I set it to 1.0 without setting any value for max-hits-per-partition?
<tuning><dispatch><top-k-probability>1.0</top-k-probability></dispatch><searchnode>.... Invalid XML according to XML schema, error in services.xml: element "top-k-probability" not allowed here; expected the element end-tag or element "max-hits-per-partition" [98:40]

<tuning><dispatch><max-hits-per-partition /><top-k-probability>1.0</top-k-probability></dispatch><searchnode>.... character content of element "max-hits-per-partition" invalid; must be an integer

@nehajatav
Copy link
Author

@bratseth Coverage is 100%, I have shared response with trace level with you over secure channel
Also, unable to set max-hits-per-partition, see comment above

@bratseth
Copy link
Member

bratseth commented May 8, 2024

This works just fine:

        <tuning>
            <dispatch>
                <top-k-probability>1.0</top-k-probability>
            </dispatch>
        </tuning>

@bjorncs
Copy link
Member

bjorncs commented May 8, 2024

@nehajatav
Could provide the output of the following command? The utility must be executed on a container node.

vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher.<insert content cluster name here> | grep topKProbability

There are also slightly different count for total number documents in the two dumps:
Total documents "3k": 31056000
Total documents "4k": 31056022

The dumps provided indicates that the top-k setting has not been correctly propagated. There is a slightly skew in the distribution of hits, with the node 2 reporting more hits than 0 and 1. The slight change in ordering was caused by additional hits from node 2 that have a lexical ordering lower than the highest in the 3k dump.

@nehajatav
Copy link
Author

nehajatav commented May 8, 2024

@bjorncs the total count may be due to increasing docs in the cluster
@bratseth was able to push top-k 1.0 but still the same result
This is the result even after convergence across all nodes with top k set to 1.0
[vespa@vespa-container-03 /]$ vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher. |grep topKProbability
topKProbability 0.9999
[vespa@vespa-container-03 /]$ vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher. |grep topKProbability
topKProbability 0.9999
[vespa@vespa-container-03 /]$

@bjorncs
Copy link
Member

bjorncs commented May 14, 2024

@nehajatav
The command you listed does not include the content cluster name as suffix to config id.

$ vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher. |grep topKProbability

You can use vespa-configproxy-cmd to determine the available config instances at a node:

$ vespa-configproxy-cmd | grep "feed/component/dispatcher"

Use the output to determine the exact arguments to vespa-get-config.
If the config still contains 0.9999 the change to services.xml has not been applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Support
Awaiting triage
Development

No branches or pull requests

4 participants