Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkIndexer: Workers flushing at the same time #646

Open
rockdaboot opened this issue Mar 30, 2023 · 1 comment
Open

BulkIndexer: Workers flushing at the same time #646

rockdaboot opened this issue Mar 30, 2023 · 1 comment

Comments

@rockdaboot
Copy link
Contributor

The Problem

The workers tend to flush at (roughly) the same time.

The main reason for this is that worker buffers are filled evenly, because all workers fetch their items from the same Go channel in parallel.

The buffer expiration made it worse, because so far (up to go-elasticsearch 8.7), there is one ticker that flushes all workers at the same time. #624 fixed this for 8.8+.

The flushing at the same time has several bad effects:

  • peak in memory usage: the bulk indexer items are kept in memory until flush, the buffer memory (HTTP body) is allocated/filled at the same time
  • peak in CPU consumption: the HTTP bodies are generated and compressed at the same time
  • peak in network usage: all requests to ES go out in parallel
  • same peak effects on the Elastic Search side, leading to 429 responses more often than needed (and expected), consequentially leading to more retries, which amplifies the above peak behavior

Possible solution: Fill buffers sequentially, flush in background

(Out of scope is changing the API)

Let's say we have number of workers set to N with FlushBytes B and FlushInterval I.
Let the Add() function collect items into an array A0 until B or I is reached, then flush in the background.
Further calls to Add() are going into a new array A1 until B or I is reached, then flush in the background.
...
Allow maximal N background flushes. If reached, throttle ingestion in the Add() function (as we do now).

Pros:

  • spread the workload over time to reduce peak behavior and pressure on ES
    Cons:
  • ?
@Anaethelion
Copy link
Contributor

I don't see any real cons, depending on actual implementation it could be hard to figure out the numbers of workers. I'm thinking about actual allocated memory in worst case scenario when everyone is stuck.

I would envision that as worker pool that would need a basic scheduler to handle the handover of items? I wonder if making that pluggable has any values to have different strategies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants