Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better validation of fieldset and disallow tensors #31173

Open
jobergum opened this issue May 11, 2024 · 0 comments
Open

Add better validation of fieldset and disallow tensors #31173

jobergum opened this issue May 11, 2024 · 0 comments
Assignees
Milestone

Comments

@jobergum
Copy link
Member

We should add validation of the fields referenced in fieldset(s), currently we allow mixing tensor and regular fields which will generate a runtime exception (UnsupportedOperationException) when attempting to search the fieldset because tensors only accepts nearestNeighbor operator.

schema msmarco {
    document msmarco {
        
        field title type string {
            indexing: index | summary
            match: text
            index: enable-bm25
        }
        field body type string {
            indexing: index
            match: text
            index: enable-bm25
        }
        field vector type tensor<float>(x[512]) {
          indexing:summary|attribute
        }
    }
    fieldset default {
        fields: title, body, url, vector
    }
}

The above generates a unrelated deploy time warning which seem to be focused on attribute versus index match defaults (not sure why the message is printed twice).

vespa deploy .       
Uploading application package... done

Success: Deployed '.' with session ID 15
WARNING For schema 'msmarco', field 'vector': The matching settings for the fields in fieldset 'default' are inconsistent (explicitly or because of field type). This may lead to recall and ranking issues.
WARNING For schema 'msmarco', field 'vector': The normalization settings for the fields in fieldset 'default' are inconsistent (explicitly or because of field type). This may lead to recall and ranking issues.

But fails obviously at runtime with UnsupportedOperationException, secondary observation is that the error is buried in the payload of a 200 OK.

curl -v "http://127.0.0.1:8080/search/?language=en&timeout=10s&user-query=what+is+dad+bod&yql=select+%2A+from+msmarco+where+userInput%28%40user-query%29+and+url+contains+%28%7Bfilter%3Atrue%2Cranked%3Afalse%7D%22huffingtonpost.co.uk%22%29"
*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /search/?language=en&timeout=10s&user-query=what+is+dad+bod&yql=select+%2A+from+msmarco+where+userInput%28%40user-query%29+and+url+contains+%28%7Bfilter%3Atrue%2Cranked%3Afalse%7D%22huffingtonpost.co.uk%22%29 HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.84.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sat, 11 May 2024 06:44:44 GMT
< Content-Type: application/json;charset=utf-8
< Vary: Accept-Encoding
< Content-Length: 4991
< 
{"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":1},"coverage":{"coverage":100,"documents":996,"full":true,"nodes":1,"results":1,"resultsFull":1},"errors":[{"code":8,"summary":"Error in search reply.","source":"msmarco","message":"UnsupportedOperationException: 
@jobergum jobergum changed the title Add validation of fieldset Add better validation of fieldset and disallow tensors May 11, 2024
@geirst geirst added this to the soon milestone May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants