Optimize and batch merge pipelines list splits queries #4968

guilload · 2024-05-09T22:29:11Z

Description

Add node_id column to splits table
Populate node_id column from splits.split_medata_json column
Add hash index on splits.node_id column
Batch list splits query when new spawning new indexing pipelines

How was this PR tested?

Updated unit tests
Ran this SQL query on Cicada which did not return null or empty values

guilload · 2024-05-10T20:18:32Z

@fulmicoton, I still have to add at least three unit tests in indexing_service.rs and merge_pipeline.rs, but this PR is ready for review.

quickwit/quickwit-indexing/src/actors/merge_pipeline.rs

quickwit/quickwit-indexing/src/actors/indexing_service.rs

fulmicoton · 2024-05-15T04:22:36Z

quickwit/quickwit-indexing/src/actors/merge_pipeline.rs

-    pub fn new(params: MergePipelineParams, spawn_ctx: &SpawnContext) -> Self {
+    pub fn new(
+        params: MergePipelineParams,
+        initial_immature_splits_opt: Option<Vec<SplitMetadata>>,


Is there a case where we want to call this with None? What is it?

So I covered the spawn_pipelines branch, which is the one apply_indexing_plan uses, but I did not handlespawn_pipeline, which I think is used by the local CLI ingest. This is the reason for using an option.

I will refactor the code a little bit to eliminate the option. With some tweaking, spawn_pipeline can use spawn_pipelines under the hood.

quickwit/quickwit-indexing/src/actors/merge_pipeline.rs

fulmicoton · 2024-05-15T04:37:04Z

quickwit/quickwit-metastore/migrations/postgresql/19_add-split-node-id-field.up.sql

+ALTER TABLE splits
+    ADD COLUMN node_id VARCHAR(253);
+
+UPDATE


Suggested change

UPDATE

// Split metadata have been stable for quite a while, so we

// allow ourselves to do this, but please reader of the future,

// do not blindly reapply this pattern.

UPDATE

guilload force-pushed the guilload/optimize-merge-pipeline-list-splits-queries branch 6 times, most recently from 46b4b27 to 6ef3f69 Compare May 10, 2024 19:58

guilload marked this pull request as ready for review May 10, 2024 20:02

guilload force-pushed the guilload/optimize-merge-pipeline-list-splits-queries branch from 6ef3f69 to f368b10 Compare May 10, 2024 20:11

guilload requested a review from fulmicoton May 10, 2024 20:17

guilload force-pushed the guilload/optimize-merge-pipeline-list-splits-queries branch 4 times, most recently from 6d4e9b4 to d5e6233 Compare May 13, 2024 22:13