Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi index leaf requests #4962

Merged
merged 12 commits into from May 16, 2024
Merged

multi index leaf requests #4962

merged 12 commits into from May 16, 2024

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented May 9, 2024

execute one leaf search request per node, instead one leaf search request per index.
A leaf search can span now multiple indices.

Especially in cases with search requests on many indices this may save a
lot of request overhead to other nodes. It also allows better control of
the aggregation memory limit of a request to a node, if it is not split up
over multiple requests, as well as finer control over concurrency.

The new LeafSearchRequest contains deduplication, so that if a request spans 1000 indices with the same doc_mapper, we send the doc_mapper only once.

TODO:
This version has no special optimizations for multi index early exit.
e.g. CanSplitDoBetter should be altered to handle multi indices.

@PSeitz PSeitz force-pushed the multi_index_leaf_requests branch 3 times, most recently from 1f69f3b to 156620e Compare May 10, 2024 04:50
@PSeitz PSeitz requested a review from fulmicoton May 10, 2024 05:15
@PSeitz PSeitz force-pushed the multi_index_leaf_requests branch 2 times, most recently from 965bd78 to afb99b2 Compare May 10, 2024 07:59
PSeitz added a commit to quickwit-oss/tantivy that referenced this pull request May 10, 2024
PR quickwit-oss/quickwit#4962 fixes an issue
where the AggregationLimits are not passed correctly. Since the
AggregationLimits are shared properly we run into contention issues.

This PR includes some straightforward improvement to reduce contention,
by only calling if the memory changed and avoiding the second read.

We probably need some sharding with multiple counters or local caching before updating the
global after some threshold.
let leaf_responses = try_join_all(leaf_request_tasks).await?;
let merge_collector = make_merge_collector(&search_request, &aggregation_limits)?;
let mut incremental_merge_collector = IncrementalCollector::new(merge_collector);
for result in leaf_responses {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we merge as they come here with a FuturesUnordered instead of using try_join_all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I planned this for later when moving the lower loop of splits per index to the top level

for result in leaf_responses {
match result {
Ok(result) => {
incremental_merge_collector.add_result(result)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah add_result is light anyway... maybe it does not matter that much as long as we don't have any lazy optimization.

@@ -976,6 +1080,8 @@ async fn leaf_search_single_split_wrapper(
}),
}
if let Some(last_hit) = locked_incremental_merge_collector.peek_worst_hit() {
// TODO: we could use a RWLock instead and read the value instead of updateing it
// unconditionally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter? we only update it once per split don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some contention on incremental_merge_collector when pushing results, which is also once per split.

It's more an issue when we start to share the threshold between indices search (which we don't currently)

Copy link
Contributor

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see inline comments

PSeitz added a commit to quickwit-oss/tantivy that referenced this pull request May 15, 2024
PR quickwit-oss/quickwit#4962 fixes an issue
where the AggregationLimits are not passed correctly. Since the
AggregationLimits are shared properly we run into contention issues.

This PR includes some straightforward improvement to reduce contention,
by only calling if the memory changed and avoiding the second read.

We probably need some sharding with multiple counters or local caching before updating the
global after some threshold.
@PSeitz PSeitz force-pushed the multi_index_leaf_requests branch from afb99b2 to 0c1b96b Compare May 16, 2024 03:32
PSeitz added 11 commits May 16, 2024 11:32
execute one request per node, instead one request per index.
A leaf can receive now request over multiple indices.

Especially in cases with search requests on many indices this saves a
lot of requests and therefore memory. It also allows better control of
the memory consumption of a requests on a node, if it is not split up
over multiple requests.
LeafSearchResponse includes serialized aggregations to send them between
nodes. This is used also on a leaf, which causes the results to be
serialized and then deserialized for merging again _per_ split.

This PR introduces IntermediateLeafResult to be used instead.
It also adds a field `aggregation_type` to LeafSearchResponse to be able
to convert self contained between IntermediateLeafResult and
LeafSearchResponse.
@PSeitz PSeitz force-pushed the multi_index_leaf_requests branch from 0c1b96b to 28bf15a Compare May 16, 2024 03:32
@PSeitz PSeitz merged commit e7091f8 into main May 16, 2024
5 checks passed
@PSeitz PSeitz deleted the multi_index_leaf_requests branch May 16, 2024 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants