Improve bandwidth estimation and adaptive switching #4825

robwalch · 2022-08-04T02:30:35Z

This PR will...

Improve bandwidth estimation and adaptive switching with smaller segments and higher TTFB

Why is this Pull Request needed?

Estimating time-to-first-byte exclusive from the time used to estimate bandwidth, allows us to more accurately predict the time it takes to load segments, especially those with shorter durations closer to the average round trip time of a request.

There were also several issues with loading stats, combined vs main video buffer observation, and calculation of inflight BW performed before bytes were loaded that all contributed to bad emergency down-switch and BWE corruption in _abandonRulesCheck. Thanks to @Pri12890 for pointing many of these out.

Are there any points in the code the reviewer needs to double check?

There are two new public methods on the player instance that I have left undocumented:

get mainForwardBufferInfo(): BufferInfo | null Allows the ABR controller to access the stream controller's media buffer. Since it only deals with main variant fragment traffic, this allows it to compare that activity to the buffer it appends to rather than the combined buffer which could be stalled if alt-audio does not keep ahead.
get ttfbEstimate(): number Similar to hls.bandwidthEstimate, hls.ttfbEstimate provides the latest time-to-first-byte estimate.

The TTFB sampling only uses one EWMA instance with the same slow half life as that used for bandwidth. The default estimate is 100(ms) and is not configurable. It is weighted on a curve that favors shorter values so that the occasional request RTT hiccup does not have as much impact on the estimate.

These changes will remain up for at least 1-2 minor releases and will not be released in v1.3. This is to give contributors time to review and test these changes and provide feedback.

Resolves issues:

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil for early testing and feedback on the Low-Latency HLS implementation)
Fixes #5094
Closes #4291 (_abandonRulesCheck govern whether fragment loading is completed or aborted based on timeouts - not active throughput)

Checklist

changes have been done against master branch, and PR does not conflict
new unit / functional tests have been added (whenever applicable)
API or design changes are documented in API.md

kanongil · 2022-08-04T14:54:51Z

It is great that you are trying to tackle the bandwidth estimation issues!

While it makes sense to use request latency as part of the effective bandwidth calculation, this averaging will fail with #3988. With preload hint fetching, the TTFB is not correlated with the latency, but with the request start time relative to the latest index update and part duration, which can both change from part to part. As such, I would prefer an alternative approach that doesn't require this averaging.

FYI, it's possible to get more accurate transfer timing for a single request using the Resource Timing API, though it is a bit cumbersome to extract and requires server opt-in using the Timing-Allow-Origin header.

robwalch · 2022-08-04T21:53:56Z

Thanks @kanongil,

I added to a note to #3988: "do not sample TTFB for blocked part-hint responses". The _abandonRulesCheck will need to change as well for part hints.

src/controller/abr-controller.ts

Pri12890 · 2022-08-04T22:04:00Z

src/controller/abr-controller.ts

-        (duration * levelNextBitrate) / (8 * 0.8 * loadRate);
+      fragLevelNextLoadedDelay = loadRate
+        ? (duration * levelNextBitrate) / (8 * 0.8 * loadRate)
+        : duration / bwEstimate + ttfbEstimate;


for else part, are we missing (duration * levelNextBitrate) and by mistake have only used duration.

Thanks for the find.

Fixed in af0843a

Pri12890 · 2022-08-04T22:07:35Z

src/controller/abr-controller.ts

-    const bwEstimate: number = this.bwEstimator.getEstimate();
-    logger.warn(`Fragment ${frag.sn}${
+    // if estimated load time of new segment is completely unreasonable, ignore and do not emergency switch down
+    if (fragLevelNextLoadedDelay > duration * 10) {


not able to imagine how would we run into this in general and moreover it is smaller than fragLoadedDelay

We can end up with very very low loadRate calculations. When this happens it's usually because of a very problematic request or a huge drop. If emergency down-switching is not going to prevent a stall then it's not worth aborting here. Wait for more data and/or let the normal ABR cycle recover.

if we decide to keep fragLevelNextLoadedDelay in ms, we should convert RHS to ms, i think duration is in seconds right?

duration is in seconds

src/controller/abr-controller.ts

Pri12890 · 2022-08-04T22:33:54Z

src/controller/abr-controller.ts

+    const processingMs =
+      stats.parsing.end -
+      stats.loading.start -
+      Math.min(


Yeah this change is good improvement, does it work well for Parts as well?

It appears to work well with the sample LL-HLS test stream which uses two 1s parts per segment.

src/controller/abr-controller.ts

kanongil · 2022-08-05T08:48:34Z

src/controller/abr-controller.ts

+      );
+    } else {
+      // If there has been no loading progress, sample TTFB
+      this.bwEstimator.sampleTTFB(timeLoading);


Why sample TTFB when no bytes have been delivered? Won't this undershoot?

At this point timeLoading is expected to be greater than the TTFB estimate (we should have exited above if it is not). The idea is to increase the estimate so that the penalty is not just a down-switch but also impact future estimates.

kanongil · 2022-08-05T08:54:09Z

src/controller/abr-controller.ts

+    // reset forced auto level value so that next level will be selected
+    this._nextAutoLevel = -1;
+
+    this.bwEstimator.sampleTTFB(stats.loading.first - stats.loading.start);


Maybe move up before the this.ignoreFragment() gating? TTFB should still be relevant for eg. an init segment.

Done fe62282. Keeping it restricted to main variant only. We would not want alt-audio or subs to impact the behavior here which will remain main variant only for this change-set.

Pri12890 · 2022-08-10T20:13:55Z

src/controller/abr-controller.ts

+    hls.nextLoadLevel = nextLoadLevel;
+    if (loadedFirstByte) {
+      // If there has been loading progress, sample bandwidth using loading time offset by minimum TTFB time
+      this.bwEstimator.sample(


We can sample TTFB here too?

Pri12890 · 2022-08-10T20:20:23Z

src/controller/abr-controller.ts

+    if (loadedFirstByte) {
+      // If there has been loading progress, sample bandwidth using loading time offset by minimum TTFB time
+      this.bwEstimator.sample(
+        timeLoading - Math.min(ttfbEstimate, ttfb),


timeLoading - ttfb This might be more accurate for bwe. agreed high TTFB is not healthy, but then that doesn't reflect user's true network condition. Also largely high TTFB is typically outliers, but those outliers can really bring fastEstimate down, and catching up again on true BW can take a while(because of small duration, weight will be smaller). Resulting user playing in low resolution unnecessarily. Moreover, how can lower bwe help with high TTFB even if it is for some time duration. Because that high TTFB will hold true for lower resolution segments as well(in theory). so track switch will not help?

Pri12890 · 2022-08-10T20:22:05Z

src/controller/abr-controller.ts

+      stats.loading.start -
+      Math.min(
+        stats.loading.first - stats.loading.start,
+        this.bwEstimator.getEstimateTTFB()


same comment as abandonRules, too conservative bwe.

silltho · 2022-09-05T12:35:52Z

src/controller/abr-controller.ts

+    const processingMs =
+      stats.parsing.end -
+      stats.loading.start -
+      Math.min(


Hello

I encountered the Bandwidth Estimation Problem with LL-HLS streams myself, but was never able to find a reliable solution. So thank you, I appreciate your work on this PR and I'm happy if i can help you.

After playing around a little bit, I felt that this solution is working pretty well already. When I wanted to compare the most recent release of hls.js with this PR, I noticed a problem with the bandwidth graph on the demo page.

It appears to me that the demo graph is calculating the bandwidth values separately, and therefore also needs to be updated.

demo/main.js lines 553 - 556
Math.round( (8 * data.stats.total) / (data.stats.buffering.end - data.stats.loading.start) ),

Any thoughts on that?

Thanks Thomas

@robwalch would it help if I create a PR to fix the demo?

Not a Contribution

@silltho can you rebase your PR now that this is merged? Thank you!

shacharz

There were a few unit conversion bugs a long the way, it might be worth to create a few sanity unit tests to make sure this works as expected.

shacharz · 2022-09-06T14:05:16Z

src/controller/abr-controller.ts

@@ -157,13 +157,17 @@ class AbrController implements ComponentAPI {
    const expectedLen =
      stats.total ||
      Math.max(stats.loaded, Math.round((duration * level.maxBitrate) / 8));
+    let timeStreaming = timeLoading - ttfb;
+    if (timeStreaming < 1 && loadedFirstByte) {
+      timeStreaming = Math.min(timeLoading, (stats.loaded * 8) / bwEstimate);


If we're at the point that the network is so slow (only the first buffer was emitted by the network stack, we're in the abondonRuleCheck) then probably something happened to the network, why should we ignore this effect and rely on bwEstimate ?

suggestion:
timeStreaming = timeLoading;

suggestion: timeStreaming = timeLoading;

We're trying to remove roundtrip latency (timeLoading - ttfb) from the equation so that is not what we want.

In this case, almost all of the time was spent waiting for roundtrip (timeStreaming < 1). It hasn't been long enough to know if throughput is affected yet, so we can continue assuming that only roundtrip latency was impacted. As long as it's been more than one millisecond since we received bytes, then we can calculate a load rate based on actual timeStreaming.

shacharz · 2022-09-08T08:37:35Z

src/utils/ewma-bandwidth-estimator.ts

+    // (longer times have less weight with expected input under 1 second)
+    const seconds = ttfb / 1000;
+    const weight = Math.sqrt(2) * Math.exp(-Math.pow(seconds, 2) / 2);
+    this.ttfb_.sample(weight, Math.max(ttfb, 5));


Is it on purpose that the weight has a unit of f(second), while the value's unit is milisecond ?

weight is not measured in units of time. The weight curve's input is in seconds because we expect that the majority of round trip time will be < 1s and these samples have greater impact on our weighted average. If we have an outlier where the server could not respond for a couple of seconds, its weight and impact on this estimate will be lower.

shacharz · 2022-09-08T08:38:52Z

src/utils/ewma-bandwidth-estimator.ts

+    // (longer times have less weight with expected input under 1 second)
+    const seconds = ttfb / 1000;
+    const weight = Math.sqrt(2) * Math.exp(-Math.pow(seconds, 2) / 2);
+    this.ttfb_.sample(weight, Math.max(ttfb, 5));


Out of curiosity why min sample value of 5ms?

This should probably be addressed by not sampling small values in abandon incomplete request cases. I chose this magic number because based on testing with real traffic it seemed unlikely we would get lower values that were realistic or reliable.

shacharz · 2022-09-08T08:42:51Z

src/controller/abr-controller.ts

@@ -183,7 +187,7 @@ class AbrController implements ComponentAPI {
      const levelNextBitrate = levels[nextLoadLevel].maxBitrate;
      fragLevelNextLoadedDelay = loadRate
        ? (duration * levelNextBitrate) / (8 * 0.8 * loadRate)
-        : duration / bwEstimate + ttfbEstimate;
+        : (duration * levelNextBitrate) / bwEstimate + ttfbEstimate;


Why are we conservative with loadRate, while not conservative with bwEstimate ?
loadRate is falsey when we didn't load the first bytes, I'd argue bwEstimate is over promising in this case.

Why are we conservative with loadRate, while not conservative with bwEstimate ?

Changed this to always add ttfbEstimate rather than augment the load rate or bwe. There will always be a round-trip cost when switching down to load a new segment, so this should provide a better estimate in either case. 846f618

loadRate is falsey when we didn't load the first bytes, I'd argue bwEstimate is over promising in this case.

bwEstimate is the best estimate we have. It is more likely that the ttfbEstimate has not been met if a fragment request needs to be aborted before any data has been received.

…first byte loaded Subset of #4825

…first byte loaded (#4917) Subset of #4825

…first byte loaded (video-dev#4917) Subset of video-dev#4825

@Oleksandr0xB

…ents and higher TTFB Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283) Fixes #3563 and Closes #3595 (special thanks to @kanongil)

@Oleksandr0xB

* Improve bandwidth estimation and adaptive switching with smaller segments and higher TTFB Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283) Fixes #3563 and Closes #3595 (special thanks to @kanongil) * Load rate and loaded delay calculation fixes * Convert ttfbEstimate to seconds * Include main variant init segments in TTFB sampling * Use ttfb estimate in abandon rules down-switch timing

robwalch requested review from littlespex and itsjamie August 4, 2022 02:30

robwalch added this to the 1.4.0 milestone Aug 4, 2022

robwalch force-pushed the feature/estimate-ttfb-fix-part-stats-improve-abr branch from f4090fa to bb5b3e7 Compare August 4, 2022 16:58

robwalch requested a review from cjpillsbury August 4, 2022 20:44

Pri12890 reviewed Aug 4, 2022

View reviewed changes

src/controller/abr-controller.ts Show resolved Hide resolved

Pri12890 reviewed Aug 4, 2022

View reviewed changes

src/controller/abr-controller.ts Show resolved Hide resolved

Pri12890 reviewed Aug 4, 2022

View reviewed changes

robwalch force-pushed the feature/estimate-ttfb-fix-part-stats-improve-abr branch from 93cb148 to af0843a Compare August 4, 2022 23:00

Pri12890 reviewed Aug 4, 2022

View reviewed changes

src/controller/abr-controller.ts Outdated Show resolved Hide resolved

Pri12890 reviewed Aug 4, 2022

View reviewed changes

src/controller/abr-controller.ts Outdated Show resolved Hide resolved

Pri12890 reviewed Aug 4, 2022

View reviewed changes

src/controller/abr-controller.ts Show resolved Hide resolved

kanongil reviewed Aug 5, 2022

View reviewed changes

robwalch added the ABR label Aug 5, 2022

This was referenced Aug 5, 2022

Bandwidth significantly underestimated during startup #2173

Closed

player requires to download whole chunk before it realizes the available bandwidth has dropped #4291

Closed

robwalch force-pushed the feature/estimate-ttfb-fix-part-stats-improve-abr branch from 1aeeead to fe62282 Compare August 5, 2022 21:19

robwalch mentioned this pull request Aug 8, 2022

Do Custom ABR Controllers really need get/set autoLevelCapping? #4824

Closed

Pri12890 reviewed Aug 10, 2022

View reviewed changes

robwalch mentioned this pull request Aug 25, 2022

Consultation about Low Latency HLS player #4731

Closed

silltho reviewed Sep 5, 2022

View reviewed changes

shacharz reviewed Sep 8, 2022

View reviewed changes

robwalch added a commit that referenced this pull request Sep 20, 2022

Fix estimates and assumptions in abandonRulesCheck that occur before …

a795199

…first byte loaded Subset of #4825

robwalch mentioned this pull request Sep 20, 2022

Patch for abandonRulesCheck sudden quality drops #4917

Merged

3 tasks

robwalch added a commit that referenced this pull request Sep 20, 2022

Fix estimates and assumptions in abandonRulesCheck that occur before …

c746579

…first byte loaded Subset of #4825

robwalch added a commit that referenced this pull request Sep 21, 2022

Fix estimates and assumptions in abandonRulesCheck that occur before …

98d4ab3

…first byte loaded (#4917) Subset of #4825

robwalch added this to Low-Latency Improvements in Release Planning and Backlog Sep 30, 2022

robwalch mentioned this pull request Oct 14, 2022

hls js always picking worst level on auto level mode #4966

Closed

matteocontrini pushed a commit to matteocontrini/hls.js that referenced this pull request Oct 31, 2022

Fix estimates and assumptions in abandonRulesCheck that occur before …

49820b7

…first byte loaded (video-dev#4917) Subset of video-dev#4825

robwalch mentioned this pull request Nov 4, 2022

Fix bandwidth sampling for small transfers and fast connections #3595

Closed

3 tasks

robwalch force-pushed the feature/estimate-ttfb-fix-part-stats-improve-abr branch from fe62282 to b4b3bfd Compare November 28, 2022 22:50

robwalch mentioned this pull request Dec 10, 2022

_abandonRulesCheck may not run for the first fragment after initialization segments #5094

Closed

robwalch force-pushed the feature/estimate-ttfb-fix-part-stats-improve-abr branch from 846f618 to 6f19423 Compare January 9, 2023 03:37

getroot mentioned this pull request Jan 9, 2023

Understanding "AUTO" rendition AirenSoft/OvenPlayer#327

Closed

robwalch and others added 5 commits January 12, 2023 10:24

Improve bandwidth estimation and adaptive switching with smaller segm…

4188d69

…ents and higher TTFB Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283) Fixes #3563 and Closes #3595 (special thanks to @kanongil)

Load rate and loaded delay calculation fixes

d18a6ee

Convert ttfbEstimate to seconds

5ad616c

Include main variant init segments in TTFB sampling

52d5153

Use ttfb estimate in abandon rules down-switch timing

1672028

robwalch force-pushed the feature/estimate-ttfb-fix-part-stats-improve-abr branch from 6f19423 to 1672028 Compare January 12, 2023 18:41

robwalch merged commit 0f14a22 into master Jan 18, 2023

robwalch deleted the feature/estimate-ttfb-fix-part-stats-improve-abr branch January 18, 2023 18:54

robwalch restored the feature/estimate-ttfb-fix-part-stats-improve-abr branch January 18, 2023 19:28

robwalch mentioned this pull request Mar 22, 2023

Add playlist RTT and time-to-load given buffer ahead #5329

Merged

robwalch deleted the feature/estimate-ttfb-fix-part-stats-improve-abr branch March 22, 2023 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve bandwidth estimation and adaptive switching #4825

Improve bandwidth estimation and adaptive switching #4825

robwalch commented Aug 4, 2022 •

edited

kanongil commented Aug 4, 2022

robwalch commented Aug 4, 2022

Pri12890 Aug 4, 2022

robwalch Aug 4, 2022 •

edited

robwalch Aug 4, 2022 •

edited

Pri12890 Aug 4, 2022

robwalch Aug 4, 2022

Pri12890 Aug 4, 2022

robwalch Aug 4, 2022

Pri12890 Aug 4, 2022

robwalch Aug 5, 2022

kanongil Aug 5, 2022

robwalch Aug 5, 2022 •

edited

kanongil Aug 5, 2022

robwalch Aug 5, 2022

Pri12890 Aug 10, 2022

Pri12890 Aug 10, 2022 •

edited

Pri12890 Aug 10, 2022

silltho Sep 5, 2022

silltho Sep 13, 2022

silltho Sep 14, 2022

robwalch Jan 18, 2023

shacharz left a comment

shacharz Sep 6, 2022

robwalch Sep 9, 2022 •

edited

shacharz Sep 8, 2022

robwalch Sep 9, 2022 •

edited

shacharz Sep 8, 2022

robwalch Sep 9, 2022

shacharz Sep 8, 2022

robwalch Nov 28, 2022

Improve bandwidth estimation and adaptive switching #4825

Improve bandwidth estimation and adaptive switching #4825

Conversation

robwalch commented Aug 4, 2022 • edited

This PR will...

Why is this Pull Request needed?

Are there any points in the code the reviewer needs to double check?

Resolves issues:

Checklist

kanongil commented Aug 4, 2022

robwalch commented Aug 4, 2022

Choose a reason for hiding this comment

robwalch Aug 4, 2022 • edited

Choose a reason for hiding this comment

robwalch Aug 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robwalch Aug 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pri12890 Aug 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shacharz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robwalch Sep 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robwalch Sep 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robwalch commented Aug 4, 2022 •

edited

robwalch Aug 4, 2022 •

edited

robwalch Aug 4, 2022 •

edited

robwalch Aug 5, 2022 •

edited

Pri12890 Aug 10, 2022 •

edited

robwalch Sep 9, 2022 •

edited

robwalch Sep 9, 2022 •

edited