Better algorithm for stream flow control #733

agrover · 2020-06-16T15:35:41Z

What we have now is the thing we did that was fast to implement, rather than a fully thought-out implementation. We should improve our algorithm, when possible.

from #726 (comment):

We have an estimate of BDP we might use to tune this. We should aim for BDP*2 at a minimum if we have an algorithm like this, otherwise we will need more frequent sending of updates.

mxinden · 2024-04-19T07:38:02Z

Status quo neqo

Static connection flow-control limit:

neqo/neqo-transport/src/connection/params.rs

Line 21 in 4fc4d16

const LOCAL_MAX_DATA: u64 = 0x3FFF_FFFF_FFFF_FFFF; // 2^62-1

Static stream receive window limit:

neqo/neqo-transport/src/recv_stream.rs

Line 33 in c004359

const RX_STREAM_DATA_WINDOW: u64 = 0x10_0000; // 1MiB

Static stream send window limit:

neqo/neqo-transport/src/send_stream.rs

Line 36 in c004359

pub const SEND_BUFFER_SIZE: usize = 0x10_0000; // 1 MiB

Other implementations

Potential solution

Implement auto-tuning based on Google's QUIC design document:

Algorithm

As above, the flow control window update triggered when:
available window < max window size / 2,
where available window = max receive window offset - bytes consumed

to realize auto-tuning, add the following logic just before issuing a window update

keep track of time interval between subsequent window updates, call this since_last_update.

if ( since_last_update < RTT * trigger factor ) then max window size = MIN(max window size * increase factor, upper bound).

trigger factor is 2

increase factor is 2

As above, window update sets:
max_received window offset += (max window size - available window)

https://docs.google.com/document/d/1F2YfdDXKpy20WVKJueEf4abn_LVZHhMUMS5gX6Pgjl4/edit#heading=h.j3nq4bn2eaud

Used in quic-go.

Also I implemented this for a muxer on top of TCP in the past libp2p/rust-yamux#176.

Alternatives

More research needed. Pointers very much appreciated.

This commit adds a basic smoke test using the `test-ficture` simulator, asserting that on a connection with unlimited bandwidth and 50ms round-trip-time Neqo can eventually achieve > 1 Gbit/s throughput. Showcases the potential a future stream flow-control auto-tuning algorithm can have. See mozilla#733.

martinthomson · 2024-04-29T06:29:33Z

I believe that a better approach would be not to double the window, but to increase the window by an amount equal to the amount consumed since the last increase if the time taken to consume the windows is less than one RTT (increased by a small fudge factor to allow slack for reporting delays). This is similar to the increase in Reno CC. The effect then would be that the window increases less rapidly and maybe more often, but it would end up with a commitment that is closer to a 2x factor of BDP, rather than being 4x or more.

mxinden · 2024-05-14T13:52:51Z

I believe that a better approach would be not to double the window, but to increase the window by an amount equal to the amount consumed since the last increase if the time taken to consume the windows is less than one RTT

Given that the premise ("if the time taken to consume the windows is less than one RTT") requires the window to be fully consumed, the "amount consumed" will be equal to the window, thereby the increase will be equal to the window, and thus the new window will be twice as large as the current window. How is that different to doubling the window as proposed above?

I might be confusing (a) the increase of the available window and (b) the increase of the maximum window size above.

@martinthomson would you mind rephrasing your proposal above, with the terminology used in fc.rs. For what it is worth, here is the implementation of the Google QUIC algorithm:

neqo/neqo-transport/src/fc.rs

Lines 393 to 405 in 0ad1b77

    
           if self.retired + self.max_active / 2 > self.max_allowed 
        
               && self 
        
                   .max_allowed_sent_at 
        
                   .is_some_and(|at| now - at < rtt * 2) 
        
               && self.max_active < STREAM_MAX_ACTIVE_LIMIT 
        
           { 
        
               let prev_max_active = self.max_active; 
        
               self.max_active = min(self.max_active * 2, STREAM_MAX_ACTIVE_LIMIT); 
        
               qdebug!( 
        
                   "Increasing max stream receive window: previous max_active: {} MiB new max_active: {} MiB now: {now:?} rtt: {rtt:?} stream_id: {}", 
        
                   prev_max_active / 1024 / 1024, self.max_active / 1024 / 1024, self.subject, 
        
               ); 
        
           }

martinthomson · 2024-05-15T03:03:00Z

Sure. The goal is to increase the value of self.max_active based on perceived changes in the BDP, as observed by the throughput on the affected stream.

Therefore, I suggest that if the rate at which self.retired increases (that is, the change in that value, divided by the time elapsed) exceeds some function of self.max_active / path.rtt, then we can increase self.max_active by the amount that self.retired has increased.

This will approximately lead to a doubling as self.max_active determines how much data can be outstanding, but that doubling might happen on a shorter timescale than an RTT, meaning that rate increases could exceed one doubling per RTT. The same is true of what you proposed as well, just that this version will cap out at a lower value for self.max_active.

Note that we could increase only based on the amount by which the increase exceeds previous expectations, which would lead to a closer adaptation, but would be more reliant on ACK rate than the simple scheme.

agrover added the enhancement New feature or request label Jun 16, 2020

agrover mentioned this issue Jun 16, 2020

Fix issue with packet containing MAX_STREAM_DATA being lost #726

Merged

agrover added this to Low priority in 2H 2020 bug triage Jun 16, 2020

ddragana mentioned this issue Jun 22, 2020

Properly handle receipt of StreamDataBlocked frame #754

Merged

ddragana added the p3 Backlog label Nov 24, 2021

mxinden mentioned this issue Apr 19, 2024

Is the SEND_BUFFER_SIZE and RX_STREAM_DATA_WINDOW small? #1820

Closed

mxinden mentioned this issue Apr 25, 2024

test(transport): maximum throughput on unlimited bandwidth and 50ms #1857

Draft

mxinden linked a pull request May 2, 2024 that will close this issue

feat: auto-tune stream receive window #1868

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better algorithm for stream flow control #733

Better algorithm for stream flow control #733

agrover commented Jun 16, 2020

mxinden commented Apr 19, 2024 •

edited

martinthomson commented Apr 29, 2024

mxinden commented May 14, 2024 •

edited

martinthomson commented May 15, 2024

Better algorithm for stream flow control #733

Better algorithm for stream flow control #733

Comments

agrover commented Jun 16, 2020

mxinden commented Apr 19, 2024 • edited

Status quo neqo

Other implementations

Potential solution

Alternatives

martinthomson commented Apr 29, 2024

mxinden commented May 14, 2024 • edited

martinthomson commented May 15, 2024

mxinden commented Apr 19, 2024 •

edited

mxinden commented May 14, 2024 •

edited