RFC: memory accounting and sort of circuit breaking to avoid OOM #16075

andsel · 2024-04-09T13:52:27Z

As a user of Logstash
I want the process doesn't go Out Of Memory when it's under pressure
so that smoothly lows down the upstream systems's events flow, moving the process in more stable zone.

Problem statement

Looking at latest Out Of Memory errors (OOM for short), the majority is originated from input plugin buffers allocation.
Given that Beats input is the most used plugin to send data into Logstash, that means the first phase of work should start from it and plugins that allocates memory on behalf of Netty.
Another point of memory retention are the memory queues, that keeps live references to events to be processed. This part could be the interest of second phase.

First phase

To provide OOM counter measure of IO buffers used by Netty plugins, a buffer allocator with memory accounting should be implemented. Such allocator, instead of delegate the OOM identification to PoolArena and at a last level to java.nio.ByteBuffer.allocateDirect, should keep the counting of used memory, when passing a certain high watermark sign, throw a specific error.
Ideally the allocator is used by channel handlers during the pull of data out of the network interface, so it's the ideal point to do memory accounting used by IO buffers.
If the allocation of a buffer would trigger a watermark pass, than an error is raised to the caller should push back the incoming requests.
How to push back the request greatly depends on the procotocol that's implemented. The ideal case, would be to send a negative ACK to the upstream, so that it can pause and when the pressure on the memory goes down ask the upstream system to restart sending data.
Sadly none of Netty input plugins implement such protocol, and the way they have to slowdown the upstream is:

for HTTP, return an HTTP response with proper error status to notify the upstream.
for Beats input, do not send any ACK and stop pulling data.
for TCP input, stop pulling data.

Stop pulling data out of network adapter, make the TCP to exert backpressure to the upstream system, because the locals OS fills its IO buffers, and TCP push back.
However stop pulling data in Netty means set auto read to false, which in a subsequent instant has to be resembled, presumably when the memory pressure goes below a certain low watermark limit.
In tentative netty/netty#6662 Netty's PR, was proposed to reset to true the autoread when the channel writability change, but that isn't a good choice because could drive to a deadlock between the upstream and the downstream.

Second phase

In memory queues between input and filter sections keep a list of references to Logstash events. If each is event is relatively big, or there are many many pipelines, it could happen that a lot of memory is retained by queues and drive to Java heap OOM.
Queues are naturally bounded, usually batch size times workers, and that provide a bound to the events in a queue. Suppose Logstash is running on a 32 or 48 cores machines, with a thousand of pipelines.
Potentially it could retain up to 6 millions events, 125 (batch size) times 48 (number of workers) times 1000 (number of pipelines).
If the medium size of event is 1KB it would mean 6GB of memory. If any of the multipliers vary, this occupation of memory could oscillate a lot, driving close to OOM situations.
So the queue size is not enough, it's needed something that keeps track of the total space retained by queues and eventually reject to insert the event, and push back.
Push back after a buffer has already been consumed and created an event, could be complicated, because neither of the implemented input protocols could negatively ACK an already ACKed message or aks to the upstream to resend an already sent message.

The text was updated successfully, but these errors were encountered:

andsel added enhancement status:needs-triage labels Apr 9, 2024

andsel self-assigned this Apr 9, 2024

andsel changed the title ~~RFC: memory account and sort of circuit breaking to avoid OOM~~ RFC: memory accounting and sort of circuit breaking to avoid OOM May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: memory accounting and sort of circuit breaking to avoid OOM #16075

RFC: memory accounting and sort of circuit breaking to avoid OOM #16075

andsel commented Apr 9, 2024 •

edited

RFC: memory accounting and sort of circuit breaking to avoid OOM #16075

RFC: memory accounting and sort of circuit breaking to avoid OOM #16075

Comments

andsel commented Apr 9, 2024 • edited

Problem statement

First phase

Second phase

andsel commented Apr 9, 2024 •

edited