Backpressure when updating points to avoid OOM #4169

tekumara · 2024-05-03T11:45:14Z

Is your feature request related to a problem? Please describe.

A single threaded client doing sequential batch updates can cause qdrant to be killed with an OOM.

We've seen this occur regularly in a 3 node cluster, 1GB memory per node, for a collection of ~4000 vectors, with replication factor and write consistency 3.

Describe the solution you'd like

qdrant to implement backpressure / flow control and respond with HTTP code 429 to mutating requests when near memory limits.

Describe alternatives you've considered

Adding a sleep to the client to slow throughput.

Additional context

See similar implementations of this feature:

Happy to provide a script to replicate the OOMs if it helps.

timvisee · 2024-05-03T11:53:54Z

I like this idea though I wonder how well it would work in practice.

With this, you'd have to define some threshold which may be quite arbitrary. Also, preventing updates with a 429 does not mean it won't OOM. Ongoing optimizations may still claim a lot of memory causing a crash.

Out of curiosity; what operations are you sending, and at what rate?

tekumara · 2024-05-03T12:39:06Z

Could the threshold be some percentage of memory, that would leave enough space for background optimizations?

Each operation is a batch update with 20 upserts + 20 deletes with a filter selector, issued sequentially by the client ie: there is never more than one batch update concurrently (called with wait=true). See these logs which end when the qdrant-0 node is killed. The batch updates look like they take ~60ms one after each other.

Stanleylail · 2024-05-04T00:08:59Z

Back pressure for HTTP requests doesn't prevent OutOfMemoryError Better openapi definitions #1016 . 2. WebfluxUploadController.java looks like the option.

Stanleylail · 2024-05-04T07:56:18Z

Fluent Bit 'storage.max_chunks_up'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backpressure when updating points to avoid OOM #4169

Backpressure when updating points to avoid OOM #4169

tekumara commented May 3, 2024

timvisee commented May 3, 2024

tekumara commented May 3, 2024 •

edited

Stanleylail commented May 4, 2024

Stanleylail commented May 4, 2024

Backpressure when updating points to avoid OOM #4169

Backpressure when updating points to avoid OOM #4169

Comments

tekumara commented May 3, 2024

timvisee commented May 3, 2024

tekumara commented May 3, 2024 • edited

Stanleylail commented May 4, 2024

Stanleylail commented May 4, 2024

tekumara commented May 3, 2024 •

edited