New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic Query #108483
Semantic Query #108483
Conversation
Add the semantic query to the Query DSL, which is used to query semantic_text fields --------- Co-authored-by: carlosdelest <carlos.delgado@elastic.co> Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
// No inference results have been indexed yet | ||
childQueryBuilder = new MatchNoneQueryBuilder(); | ||
} else if (inferenceResults instanceof TextExpansionResults textExpansionResults) { | ||
// TODO: Use WeightedTokensQueryBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use WeightedTokensQueryBuilder
here once #108254 is merged
Pinging @elastic/es-search (Team:Search) |
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
...inference/src/yamlRestTest/resources/rest-api-spec/test/inference/40_semantic_text_query.yml
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Show resolved
Hide resolved
@@ -449,6 +449,7 @@ private static class ServiceHolder implements Closeable { | |||
List<NamedWriteableRegistry.Entry> entries = new ArrayList<>(); | |||
entries.addAll(IndicesModule.getNamedWriteables()); | |||
entries.addAll(searchModule.getNamedWriteables()); | |||
pluginsService.forEach(plugin -> entries.addAll(plugin.getNamedWriteables())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there named writeables that aren't queries or anything? I don't understand the need for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally added this because early on in test development, the inference call used named writeables that weren't registered. Since we are properly simulating the inference call in the test now, the unregistered named writeables aren't used and this can be removed. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... And I needed to add this back when I extended the serialization tests in 27dff4b. rewriteQuery
now checks that the query can be serialized in between when it is rewritten on the coordinator & data nodes. For the semantic query, this means serializing InferenceResults
, which in turn requires registering the named writeables for that class.
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
...inference/src/yamlRestTest/resources/rest-api-spec/test/inference/40_semantic_text_query.yml
Show resolved
Hide resolved
...inference/src/yamlRestTest/resources/rest-api-spec/test/inference/40_semantic_text_query.yml
Show resolved
Hide resolved
...inference/src/yamlRestTest/resources/rest-api-spec/test/inference/40_semantic_text_query.yml
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
@elasticsearchmachine run elasticsearch-ci/part-1 |
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
InferenceAction.INSTANCE, | ||
inferenceRequest, | ||
listener.delegateFailureAndWrap((l, inferenceResponse) -> { | ||
inferenceResultsSupplier.set(inferenceResponse.getResults()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for warning
inference results size & supported kinds early and fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point and also made me realize we should probably convert from InferenceServiceResults
to InferenceResults
in doRewriteGetInferenceResults
, which is executed on the coordinator node.
As currently written, each shard will do this conversion individually (when doRewriteBuildSemanticQuery
is executed). By checking & converting earlier, we prevent duplicate work and do (some of) the error checking in the coordinator node, avoiding an unnecessary serialization/deserialization when the inference results are clearly invalid.
@carlosdelest WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me - if we can avoid doing work that can be preempted in the coordinating node, let's do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this, but it required more changes than I anticipated. This is due to the fact that prior to addressing this, the query enforced that inferenceResults
is always non-null after re-writing on the coordinator node. With the changes in bceeb84, this is no longer the case, which created a couple more edge cases.
I also want to add a test to SemanticQuerBuilderTests
to validate that the query can be serialized after re-writing on the coordinator node, particularly when no inference ID can be looked up for the field queried. This is not covered by the YAML tests because they all run on single-node clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the test in 27dff4b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests, Mike!
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Outdated
Show resolved
Hide resolved
.../inference/src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilder.java
Show resolved
Hide resolved
...rTest/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilderMultiClusterIT.java
Outdated
Show resolved
Hide resolved
@elasticsearchmachine run elasticsearch-ci/part-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mikep86 Great job and nice test coverage in yml tests. This PR overall LGTM, I have several small minor points and they are also subject to a discussion and can be addressed later:
- I am not sure if this query should respect:
index.query.parse.allow_unmapped_fields
, as to my mind the primary use for this setting is for CCS and the query is not supported in CCS. - Not starting a separate
SemanticQueryBuilderMultiClusterIT.java
just to check that this query in not supported in CCS, but integrating into other CCS integration tests. - I assume documentation PR will be added later
if (inferenceResults instanceof ErrorInferenceResults errorInferenceResults) { | ||
throw new IllegalStateException( | ||
"Field [" + fieldName + "] query inference error: " + errorInferenceResults.getException().getMessage(), | ||
errorInferenceResults.getException() | ||
); | ||
} else if (inferenceResults instanceof WarningInferenceResults warningInferenceResults) { | ||
throw new IllegalStateException("Field [" + fieldName + "] query inference warning: " + warningInferenceResults.getWarning()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidkyle When would the inference service return ErrorInferenceResults
or WarningInferenceResults
? Should these result types be handled as user errors or server-side errors?
// The second rewriteAndFetch call simulates rewriting on the shard | ||
QueryBuilder rewritten = rewriteAndFetch(queryBuilder, coordinatorRewriteContext); | ||
// extra safety to fail fast - serialize the rewritten version to ensure it's serializable. | ||
assertSerialization(rewritten); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks that the query can be serialized after being rewritten on the coordinator node, which is an important check that would otherwise require setting up a multi-node integration test environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM - Awesome work
Add the
semantic
query to the Query DSL, which is used to querysemantic_text
fields: