More heap usage safety back pressure (user controllable? back pressure based on unprocessed messages?) #217

bitterfox · 2023-10-07T15:05:59Z

We develop Kafka consumer using decaton such as

Consume messages from Kafka
Write messages into file and make a large file containing 10K and more messages
- We would like create 100MB+ files, if single message size is 300B and 50% compression ratio, the number of messages contained will be 700K
- Thus we specify huge decaton.max.pending.records like 100K, 1M
We'd like to commit offset for all messages are persisted in the file
- We use external storage and we assume data is persisted when the file descriptor is closed successfully
- So we don't like to commit offset until we close it

As we configure decaton.max.pending.records to huge number, decaton could consume the huge number of messages and push down to the processor.
The huge number for decaton.max.pending.records works fine for our application if the consumer consumes messages starting from empty and messages coming lower speed than processor can process.

Assuming the scenario that we have pending records like 100K, 1M for partition and the consumer node fails.
Then partition was rebalanced and decaton will consume messages up to decaton.max.pending.records like 100K, 1M faster than the processor can process.
So more than 100K messages might occupy heap usage until the processor processes all messages.
In such a scenario, we actually had OOME for our application.
We might have a similar scenario when restarting the application or resuming the application after stopping for several hours.

So I'm wondering decaton can be improved for this kind of usecase and decaton can have much more better back pressure strategy for heap usage safety.
I propose several possible options

Processor can control back pressure
Separate decaton.max.pending.records to 2 configurations like the max number of uncommitted messages and the max number of messages yet processed and decaton control back pressure based on these 2 numbers.
Make decaton consuming speed (rps) configuration and respect it
Make executor in ProcessUnit configurable and let us control block polling
- decaton/processor/src/main/java/com/linecorp/decaton/processor/runtime/internal/ProcessorUnit.java
  
  Line 47 in d7dd139
  
  executor = Executors.newSingleThreadExecutor(
- This executor is used from the thread that calls poll()
- If we can block at executor.execute, we may control consuming speed

The text was updated successfully, but these errors were encountered:

ocadaruma · 2023-10-10T03:42:25Z

Hi, thanks for reporting the issue.

Though your use-case sounds not so common, definitely it's hard to achieve in current Decaton features.

Separate decaton.max.pending.records to 2 configurations

This approach sounds like the most simple way.

I gonna try if we can support this.

ocadaruma mentioned this issue Oct 10, 2023

Expose TopicPartition and offset from ProcessingContext #216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More heap usage safety back pressure (user controllable? back pressure based on unprocessed messages?) #217

More heap usage safety back pressure (user controllable? back pressure based on unprocessed messages?) #217

bitterfox commented Oct 7, 2023

ocadaruma commented Oct 10, 2023

More heap usage safety back pressure (user controllable? back pressure based on unprocessed messages?) #217

More heap usage safety back pressure (user controllable? back pressure based on unprocessed messages?) #217

Comments

bitterfox commented Oct 7, 2023

ocadaruma commented Oct 10, 2023