Implement ContentSteering #1172

peaBerberian · 2022-10-13T15:14:48Z

Status: It should work with the current draft of the Content Steering specification for DASH contents. There are still some missing features (proxy handling, bandwidth reporting...) but the main chunk of the logic should already be there.

Preliminary notes

What is Content Steering?

Content Steering is a mechanism allowing to prioritize CDN over others from the server-side for a given content, allowing thus to deterministically reorient requests done by several player instances.
One of the use case would be to adaptively redistribute load between multiple CDN as playback is still going on in the users' device, though they are several other use cases that can rely on this mechanism.

This mechanism is standardized and is a associated with the streaming protocol chosen: HLS now includes a chapter and attributes on it and the DASH-IF is currently drafting another for DASH based on the HLS specification (though slightly different), here.
It is the latter that this PR is trying to implement.

The DASH' Content Steering mechanism work by declaring the presence of "DASH Content Steering Manifest", or "DCSM", requestable through an URL which returns a JSON giving the current priorities.

This DCSM has its own "TTL" (time to live) which is the time in seconds after which it should be refreshed.

Implementation

The implementation was unexpectedly pretty complex. I will start describing on a higher level before going down in the details.

Macro-architecture

The idea was to add a CdnPrioritizer class in the fetchers' code, whose role would be to put in order the CDN that should be requested for each segment.
That CdnPrioritizer would also handle the refreshing logic of DASH's Content Steering Manifest, through a new fetcher element: the SteeringManifestFetcher.

Here is how the different blocks depend on one another:

               /parsers/SteeringManifest
      +----------------------------------+
      | Content Steering Manifest parser | Parse DCSM[1] into a
      +----------------------------------+ transport-agnostic steering
              ^                            Manifest structure
              |
              | Uses when parsing
              |
              |
              | /transports
      +---------------------------+
      |        Transport          |
      |                           |
      | new functions:            |
      |   - loadSteeringManifest  | Construct DCSM[1]'s URL, performs
      |   - parseSteeringManifest | requests and parses it.
      +---------------------------+
              ^
              |
              | Relies on
              |
              |
              | /core/fetchers/steering_manifest
      +-------------------------+
      | SteeringManifestFetcher | Fetches and parses a Content Steering
      +-------------------------+ Manifest in a transport-agnostic way
              ^                   + handle retries and error formatting
              |
              | Uses an instance of to load, parse and refresh the
              | Steering Manifest periodically according to its TTL[2]
              |
              |
              | /core/fetchers/cdn_prioritizer.ts
      +----------------+ Signals the priority between multiple
      | CdnPrioritizer | potential CDNs for each resource.
      +----------------+ (This is done on demand, the `CdnPrioritizer`
             ^           knows of no resource in advance).
             |
             | Asks to sort a segment's available base urls by order of
             | priority (and to filter out those that should not be
             | used).
             | Also signals when it should prevent a base url from
             | being used temporarily (e.g. due to request issues).
             |
             |
             | /core/fetchers/segment
      +----------------+
      | SegmentFetcher | Fetches and parses a segment in a
      +----------------+ transport-agnostic way
             ^           + handle retries and error formatting
             |
             | Ask to load segment(s)
             |
             | /core/stream/representation
      +----------------+
      | Representation | Logic behind finding the right segment to
      |    Stream      | load, loading it and pushing it to the buffer.
      +----------------+ One RepresentationStream is created per
                         actively-loaded Period and one per
                         actively-loaded buffer type.


[1] DCSM: DASH Content Steering Manifest
[2] TTL: Time To Live: a delay after which a Content Steering Manifest should be refreshed

CDN identification

Different ways to access a content, what is called "ServiceLocations" in DASH' content steering spec (but what we abusively called the available "CDN" in the current implementation), need here to be clearly identified, to allow easy re-prioritization.

However in the old RxPlayer code, those ServiceLocations were not clearly identified and grouped:
Instead each segment was associated directly to one or several absolute URL, with no relation created between segments. For example, detecting whether 2 segments shared a common ServiceLocation/base URL was difficult to do without resorting to substring comparison.
This caused implementation difficulties when it comes to prioritization-handling and "downgrading" (our terms for when a specific ServiceLocation is avoided for some time due to an observed issue with it).

The proposed implementation now only associates a relative URL to each segment, corresponding to the segment's unique filename. The part common between all segments from a given Representation (the "ServiceLocations") are moved at the Representation-level instead, through a property called cdnMetadata.
As a special case, the segment's relative URL could be set to null or to the empty string when the Representation's URL(s) found in cdnMetadata was sufficient to load the data.

This only works if all ServiceLocations follow a logic of concatenation between a base URL per-ServiceLocation and a segment's common relative URL. Thankfully, it appears for now to always be the case in transport protocols where multiple ServiceLocations for a given resource is possible.

We also could have moved a property doing ServiceLocation-identification on each segment;s URL and keep them absolute, but it seemed less practical while I was writing it

The cdnMetadata property present on Representations takes the form of an array of all detected ServiceLocations. Each elements of this array contains information on a single available ServiceLocation:

its base URL
an optional id, used for identification purposes, for example when compared with the output of a Content Steering Manifest.
This is based on the value of the serviceLocation <BaseURL> attribute found in the MPD

Handling of the `queryBeforeStart` attribute

The MPD may indicate that the Content Steering Manifest should either be requested before any segment or may be loaded later, so the stream can begin playback more shortly.

This is done through an MPD attribute on the <ContentSteering> element, called queryBeforeStart.
Handling this attribute has been somewhat of a pain, because its before-or-not nature under the current RxPlayer architecture would mean that it could not always be cleanly and opaquely done in the Manifest-parsing logic.
If the request needed to be performed after (or parallely to when) segments are first loaded, we had to involve some other core logic in this process of starting and handling this request.
I finally decided to only handle this initial fetch in one place (through the fetchers' CdnPrioritizer) and not repeat it in the Manifest-parsing code, for simplicity's sake.

Though I now observed a new problem: we had to communicate in some ways when the segments can actually be loaded:

directly if no Content Steering Manifest exists or if queryBeforeStart is not set or set to false
after the Content Steering Manifest has been fetched if the queryBeforeStart attribute is set to true

This could easily be done through a new event, but I disliked the opt-in nature of adding an event listener for this, as forgetting it was very simple to do and would be considered a big-enough bug.

What I preferred to do is to make the CdnPrioritizer's callback used to prioritize ServiceLocations between one another asynchronous: if the Content Steering Manifest was fetched or if queryBeforeStart was not set / set to false, it would return directly. But if both queryBeforeStart was set to true and the Content Steering Manifest was not yet fetched, it would await that request to finish, before giving an educated answer.

I prefer that solution because it opaquely forced the right "queryBeforeStart" implementation when a CdnPrioritizer is used to order ServiceLocations - this is even nicer when considering that the CdnPrioritizer also is the class fetching and refreshing the Content Steering Manifest, meaning that forgetting to use it would also mean not relying on a Content Steering Manifest anyway.

This also means that no outside block need to understand this intricacy: only the CdnPrioritizer does, which is also one of the [very rare] blocks implementing most of the Content Steering mechanisms.

Handling of the refreshing logic

The refreshing logic of the Content Steering Manifest is also performed by the CdnPrioritizer.
The implementation is somewhat simple: after the previous Steering Manifest's TTL (in seconds), we refresh it.

There is additional logic for if a <ContentSteering> appears or disappear after a MPD update. But what to do in those case appeared relatively straightforward.

In huge parts because of this refreshing logic, I also had to implement a system of events on the CdnPrioritizer for the following events:

a Content Steering Manifest request/parsing operation error arised, so it can be translated into a player event through our API. This is communicated through a "warnings" event
More importantly, a priorityChange event has been added, for when the order of priorities between ServiceLocations changed.
This was added to work-around a subtle but complex-enough situation where the priority between ServiceLocations changed while the player is waiting to retry requesting a segment through another now non-prioritized ServiceLocations.
More details on the next chapter.

Request scheduling modifications

Another specificity to take into account was how the Content Steering mechanism interacts with our request scheduling logic, especially with what we call the "exponential backoff".

This concept designates the notion that we might want to wait a delay before re-attempting a request that previously failed on a server, progressively raising that delay after each consecutive unsuccessful attempt to avoid overwhelming the server.

When considering multiple server for each resource and - even more complex - when considering that the priority between those can change while a delay is awaited, properly handling this exponential backoff mechanism became a little more complex.

What I ended-up to do was to register in an object a per-CDN (monotonically raising) timestamp at which the last request was done for a particular resource, alongside the amount of attempts already done on that same CDN.
This way, exponential backoff could be applied per-CDN and even be interrupted and restarted at any time if the priority between CDN changed in the meantime. This change of priority is known of when the CdnPrioritizer sends the priorityChange event.

Moreover, CDN on which the request fails are temporarily "downgraded" - meaning moved at the end of the priority list - for a period of time equal to the Steering Manifest's TTL (as it is specified in the DASH's Content Steering spec) - or for 60 seconds if no such TTL exists.
This also automatically allows to nicely test the second most prioritary CDN when a request through the first one fails, and still allows to loop over once all CDNs are downgraded.

lfaureyt · 2022-10-14T07:10:30Z

Impressive work !

Let's assume content steering priority switches from CDN A to CDN B while a segment request is pending on CDN A, and then segment request on A fails ... In that case, is the request re-started immediately on CDN B (and, if so, is the count of attempts on CDN B cleared ?) or is the request delayed according to its exponential backoff state on CDN B ?

monotically raising

I think you meant "monotonically" ;-)

peaBerberian · 2022-10-14T11:16:56Z

@lfaureyt Thanks!

Let's assume content steering priority switches from CDN A to CDN B while a segment request is pending on CDN A, and then segment request on A fails ... In that case, is the request re-started immediately on CDN B (and, if so, is the count of attempts on CDN B cleared ?) or is the request delayed according to its exponential backoff state on CDN B ?

I have to re-check/test that this is what's really going on but I would say that the count of attempts on CDN B for that segment (as exponential backoff is still per-segment) is not reset until the segment has been loaded.
Thus, if a request for the same segment failed on CDN B since less time than the calculated backoff at that time (backoff time bases itself on the last request made with the corresponding CDN, which may not be the last request), we will wait for at least (see below) the remaining backoff time before performing the request on that CDN.

Also, when you begin to enter cases where CDN referenced in a steering manifest have failed at least once for a given segment, the algorithm behind CDN choice becomes a little more complex: the remaining backoff time is taken into account first, then the steering manifest prioritization (CDN unlisted in that manifest are still not requested).
There, depending on the amount of retries for specific CDNs and precedent CDN prioritization, you can end up on a less-prioritized CDN still being requested first.

peaBerberian added proposal This Pull Request or Issue is only a proposal for a change with the expectation of a debate on it Priority: 4 (Very low) This issue or PR has a very low priority. labels Oct 13, 2022

peaBerberian force-pushed the next branch from 15fbfc7 to d509948 Compare October 28, 2022 08:16

peaBerberian force-pushed the feat/content_steering branch from 236e27e to 6f24680 Compare October 28, 2022 08:18

peaBerberian force-pushed the next branch from 5b27fe6 to 56c206f Compare October 28, 2022 16:12

peaBerberian force-pushed the feat/content_steering branch 2 times, most recently from 60a9f65 to 5923fbd Compare November 3, 2022 10:03

peaBerberian force-pushed the next branch from 516cccd to f305068 Compare December 8, 2022 15:01

peaBerberian force-pushed the next branch 2 times, most recently from a91f71f to 642425d Compare December 21, 2022 16:45

peaBerberian force-pushed the feat/content_steering branch from 5923fbd to a973a07 Compare December 23, 2022 11:16

peaBerberian force-pushed the next branch from f6f244e to ae60030 Compare January 31, 2023 10:46

peaBerberian force-pushed the next branch 2 times, most recently from 1adfb5e to 4101cde Compare February 9, 2023 14:58

peaBerberian force-pushed the feat/content_steering branch from a973a07 to c80a4d1 Compare March 15, 2023 12:58

peaBerberian force-pushed the next branch from a03e55f to 7cd98dd Compare May 17, 2023 16:23

peaBerberian force-pushed the feat/content_steering branch from c80a4d1 to ad492cb Compare May 17, 2023 16:26

peaBerberian force-pushed the next branch from 190ab7a to b87fabe Compare June 2, 2023 13:58

peaBerberian force-pushed the next branch from b87fabe to e4a4b04 Compare June 12, 2023 09:39

peaBerberian force-pushed the feat/content_steering branch 3 times, most recently from 489f7a3 to 0ab12e7 Compare June 13, 2023 15:30

peaBerberian force-pushed the next branch from af7c622 to ed9fc14 Compare July 4, 2023 09:09

peaBerberian force-pushed the next branch from ed9fc14 to 35c69ff Compare July 18, 2023 15:49

peaBerberian force-pushed the next branch from 35c69ff to efa38c1 Compare August 7, 2023 16:32

peaBerberian force-pushed the next branch 2 times, most recently from c08e41d to 787d37f Compare August 23, 2023 09:28

peaBerberian force-pushed the next branch from 787d37f to 4f77d99 Compare August 31, 2023 14:20

peaBerberian force-pushed the next branch 2 times, most recently from 05dbd76 to abb6dcf Compare December 22, 2023 19:58

peaBerberian force-pushed the next branch from abb6dcf to 03e02a6 Compare January 3, 2024 13:47

peaBerberian force-pushed the feat/content_steering branch from ad0a5bb to 4551523 Compare January 5, 2024 13:56

peaBerberian force-pushed the next branch 2 times, most recently from e0f9d84 to e188d68 Compare January 15, 2024 16:25

peaBerberian force-pushed the feat/content_steering branch from 4551523 to 8c390ec Compare January 15, 2024 16:53

peaBerberian force-pushed the next branch from d5e1e8c to 767f766 Compare January 23, 2024 17:27

peaBerberian force-pushed the feat/content_steering branch from 8c390ec to 6b2618c Compare January 23, 2024 17:49

peaBerberian force-pushed the feat/content_steering branch from 6b2618c to 707b26e Compare February 5, 2024 17:42

peaBerberian changed the title ~~Revert "Remove ContentSteering logic to just keep better Cdn prioriza…~~ Implement ContentSteering Feb 5, 2024

peaBerberian changed the base branch from next to dev February 5, 2024 17:42

peaBerberian force-pushed the feat/content_steering branch 3 times, most recently from 3dea214 to 4d08be7 Compare February 5, 2024 17:51

peaBerberian force-pushed the dev branch from 1de6ca5 to a18689b Compare February 5, 2024 17:59

peaBerberian force-pushed the feat/content_steering branch 3 times, most recently from fa598ec to 80330a8 Compare February 5, 2024 18:12

peaBerberian force-pushed the dev branch 2 times, most recently from 2e58dd6 to cc6a502 Compare February 20, 2024 18:45

peaBerberian force-pushed the feat/content_steering branch 6 times, most recently from 528c3d7 to e46c7d4 Compare February 27, 2024 10:39

Implement ContentSteering

ca7b77c

peaBerberian force-pushed the feat/content_steering branch from e46c7d4 to ca7b77c Compare February 27, 2024 14:14

peaBerberian force-pushed the dev branch from 1f20b46 to d71f841 Compare March 11, 2024 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ContentSteering #1172

Implement ContentSteering #1172

peaBerberian commented Oct 13, 2022 •

edited

lfaureyt commented Oct 14, 2022

peaBerberian commented Oct 14, 2022 •

edited

Implement ContentSteering #1172

Are you sure you want to change the base?

Implement ContentSteering #1172

Conversation

peaBerberian commented Oct 13, 2022 • edited

Preliminary notes

What is Content Steering?

Implementation

Macro-architecture

CDN identification

Handling of the queryBeforeStart attribute

Handling of the refreshing logic

Request scheduling modifications

lfaureyt commented Oct 14, 2022

peaBerberian commented Oct 14, 2022 • edited

peaBerberian commented Oct 13, 2022 •

edited

Handling of the `queryBeforeStart` attribute

peaBerberian commented Oct 14, 2022 •

edited