multi index leaf requests #4962

PSeitz · 2024-05-09T11:11:35Z

execute one leaf search request per node, instead one leaf search request per index.
A leaf search can span now multiple indices.

Especially in cases with search requests on many indices this may save a
lot of request overhead to other nodes. It also allows better control of
the aggregation memory limit of a request to a node, if it is not split up
over multiple requests, as well as finer control over concurrency.

The new LeafSearchRequest contains deduplication, so that if a request spans 1000 indices with the same doc_mapper, we send the doc_mapper only once.

TODO:
This version has no special optimizations for multi index early exit.
e.g. CanSplitDoBetter should be altered to handle multi indices.

PR quickwit-oss/quickwit#4962 fixes an issue where the AggregationLimits are not passed correctly. Since the AggregationLimits are shared properly we run into contention issues. This PR includes some straightforward improvement to reduce contention, by only calling if the memory changed and avoiding the second read. We probably need some sharding with multiple counters or local caching before updating the global after some threshold.

quickwit/quickwit-search/src/leaf.rs

fulmicoton · 2024-05-15T05:34:32Z

quickwit/quickwit-search/src/leaf.rs

+    let leaf_responses = try_join_all(leaf_request_tasks).await?;
+    let merge_collector = make_merge_collector(&search_request, &aggregation_limits)?;
+    let mut incremental_merge_collector = IncrementalCollector::new(merge_collector);
+    for result in leaf_responses {


shouldn't we merge as they come here with a FuturesUnordered instead of using try_join_all?

I planned this for later when moving the lower loop of splits per index to the top level

fulmicoton · 2024-05-15T05:40:05Z

quickwit/quickwit-search/src/leaf.rs

+    for result in leaf_responses {
+        match result {
+            Ok(result) => {
+                incremental_merge_collector.add_result(result)?;


ah add_result is light anyway... maybe it does not matter that much as long as we don't have any lazy optimization.

quickwit/quickwit-search/src/leaf.rs

fulmicoton · 2024-05-15T05:44:00Z

quickwit/quickwit-search/src/leaf.rs

@@ -976,6 +1080,8 @@ async fn leaf_search_single_split_wrapper(
        }),
    }
    if let Some(last_hit) = locked_incremental_merge_collector.peek_worst_hit() {
+        // TODO: we could use a RWLock instead and read the value instead of updateing it
+        // unconditionally.


does it matter? we only update it once per split don't we?

I saw some contention on incremental_merge_collector when pushing results, which is also once per split.

It's more an issue when we start to share the threshold between indices search (which we don't currently)

fulmicoton

see inline comments

PR quickwit-oss/quickwit#4962 fixes an issue where the AggregationLimits are not passed correctly. Since the AggregationLimits are shared properly we run into contention issues. This PR includes some straightforward improvement to reduce contention, by only calling if the memory changed and avoiding the second read. We probably need some sharding with multiple counters or local caching before updating the global after some threshold.

execute one request per node, instead one request per index. A leaf can receive now request over multiple indices. Especially in cases with search requests on many indices this saves a lot of requests and therefore memory. It also allows better control of the memory consumption of a requests on a node, if it is not split up over multiple requests.

LeafSearchResponse includes serialized aggregations to send them between nodes. This is used also on a leaf, which causes the results to be serialized and then deserialized for merging again _per_ split. This PR introduces IntermediateLeafResult to be used instead. It also adds a field `aggregation_type` to LeafSearchResponse to be able to convert self contained between IntermediateLeafResult and LeafSearchResponse.

This reverts commit c968e4d.

PSeitz force-pushed the multi_index_leaf_requests branch 3 times, most recently from 1f69f3b to 156620e Compare May 10, 2024 04:50

PSeitz requested a review from fulmicoton May 10, 2024 05:15

PSeitz force-pushed the multi_index_leaf_requests branch 2 times, most recently from 965bd78 to afb99b2 Compare May 10, 2024 07:59

PSeitz mentioned this pull request May 10, 2024

lower contention on AggregationLimits quickwit-oss/tantivy#2394

Merged