Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ContentSteering #1172

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Implement ContentSteering #1172

wants to merge 1 commit into from

Conversation

peaBerberian
Copy link
Collaborator

@peaBerberian peaBerberian commented Oct 13, 2022

Status: It should work with the current draft of the Content Steering specification for DASH contents. There are still some missing features (proxy handling, bandwidth reporting...) but the main chunk of the logic should already be there.

Preliminary notes

What is Content Steering?

Content Steering is a mechanism allowing to prioritize CDN over others from the server-side for a given content, allowing thus to deterministically reorient requests done by several player instances.
One of the use case would be to adaptively redistribute load between multiple CDN as playback is still going on in the users' device, though they are several other use cases that can rely on this mechanism.

This mechanism is standardized and is a associated with the streaming protocol chosen: HLS now includes a chapter and attributes on it and the DASH-IF is currently drafting another for DASH based on the HLS specification (though slightly different), here.
It is the latter that this PR is trying to implement.

The DASH' Content Steering mechanism work by declaring the presence of "DASH Content Steering Manifest", or "DCSM", requestable through an URL which returns a JSON giving the current priorities.

This DCSM has its own "TTL" (time to live) which is the time in seconds after which it should be refreshed.

Implementation

The implementation was unexpectedly pretty complex. I will start describing on a higher level before going down in the details.

Macro-architecture

The idea was to add a CdnPrioritizer class in the fetchers' code, whose role would be to put in order the CDN that should be requested for each segment.
That CdnPrioritizer would also handle the refreshing logic of DASH's Content Steering Manifest, through a new fetcher element: the SteeringManifestFetcher.

Here is how the different blocks depend on one another:

               /parsers/SteeringManifest
      +----------------------------------+
      | Content Steering Manifest parser | Parse DCSM[1] into a
      +----------------------------------+ transport-agnostic steering
              ^                            Manifest structure
              |
              | Uses when parsing
              |
              |
              | /transports
      +---------------------------+
      |        Transport          |
      |                           |
      | new functions:            |
      |   - loadSteeringManifest  | Construct DCSM[1]'s URL, performs
      |   - parseSteeringManifest | requests and parses it.
      +---------------------------+
              ^
              |
              | Relies on
              |
              |
              | /core/fetchers/steering_manifest
      +-------------------------+
      | SteeringManifestFetcher | Fetches and parses a Content Steering
      +-------------------------+ Manifest in a transport-agnostic way
              ^                   + handle retries and error formatting
              |
              | Uses an instance of to load, parse and refresh the
              | Steering Manifest periodically according to its TTL[2]
              |
              |
              | /core/fetchers/cdn_prioritizer.ts
      +----------------+ Signals the priority between multiple
      | CdnPrioritizer | potential CDNs for each resource.
      +----------------+ (This is done on demand, the `CdnPrioritizer`
             ^           knows of no resource in advance).
             |
             | Asks to sort a segment's available base urls by order of
             | priority (and to filter out those that should not be
             | used).
             | Also signals when it should prevent a base url from
             | being used temporarily (e.g. due to request issues).
             |
             |
             | /core/fetchers/segment
      +----------------+
      | SegmentFetcher | Fetches and parses a segment in a
      +----------------+ transport-agnostic way
             ^           + handle retries and error formatting
             |
             | Ask to load segment(s)
             |
             | /core/stream/representation
      +----------------+
      | Representation | Logic behind finding the right segment to
      |    Stream      | load, loading it and pushing it to the buffer.
      +----------------+ One RepresentationStream is created per
                         actively-loaded Period and one per
                         actively-loaded buffer type.


[1] DCSM: DASH Content Steering Manifest
[2] TTL: Time To Live: a delay after which a Content Steering Manifest should be refreshed

CDN identification

Different ways to access a content, what is called "ServiceLocations" in DASH' content steering spec (but what we abusively called the available "CDN" in the current implementation), need here to be clearly identified, to allow easy re-prioritization.

However in the old RxPlayer code, those ServiceLocations were not clearly identified and grouped:
Instead each segment was associated directly to one or several absolute URL, with no relation created between segments. For example, detecting whether 2 segments shared a common ServiceLocation/base URL was difficult to do without resorting to substring comparison.
This caused implementation difficulties when it comes to prioritization-handling and "downgrading" (our terms for when a specific ServiceLocation is avoided for some time due to an observed issue with it).

The proposed implementation now only associates a relative URL to each segment, corresponding to the segment's unique filename. The part common between all segments from a given Representation (the "ServiceLocations") are moved at the Representation-level instead, through a property called cdnMetadata.
As a special case, the segment's relative URL could be set to null or to the empty string when the Representation's URL(s) found in cdnMetadata was sufficient to load the data.

This only works if all ServiceLocations follow a logic of concatenation between a base URL per-ServiceLocation and a segment's common relative URL. Thankfully, it appears for now to always be the case in transport protocols where multiple ServiceLocations for a given resource is possible.

We also could have moved a property doing ServiceLocation-identification on each segment;s URL and keep them absolute, but it seemed less practical while I was writing it

The cdnMetadata property present on Representations takes the form of an array of all detected ServiceLocations. Each elements of this array contains information on a single available ServiceLocation:

  • its base URL
  • an optional id, used for identification purposes, for example when compared with the output of a Content Steering Manifest.
    This is based on the value of the serviceLocation <BaseURL> attribute found in the MPD

Handling of the queryBeforeStart attribute

The MPD may indicate that the Content Steering Manifest should either be requested before any segment or may be loaded later, so the stream can begin playback more shortly.

This is done through an MPD attribute on the <ContentSteering> element, called queryBeforeStart.
Handling this attribute has been somewhat of a pain, because its before-or-not nature under the current RxPlayer architecture would mean that it could not always be cleanly and opaquely done in the Manifest-parsing logic.
If the request needed to be performed after (or parallely to when) segments are first loaded, we had to involve some other core logic in this process of starting and handling this request.
I finally decided to only handle this initial fetch in one place (through the fetchers' CdnPrioritizer) and not repeat it in the Manifest-parsing code, for simplicity's sake.

Though I now observed a new problem: we had to communicate in some ways when the segments can actually be loaded:

  • directly if no Content Steering Manifest exists or if queryBeforeStart is not set or set to false
  • after the Content Steering Manifest has been fetched if the queryBeforeStart attribute is set to true

This could easily be done through a new event, but I disliked the opt-in nature of adding an event listener for this, as forgetting it was very simple to do and would be considered a big-enough bug.

What I preferred to do is to make the CdnPrioritizer's callback used to prioritize ServiceLocations between one another asynchronous: if the Content Steering Manifest was fetched or if queryBeforeStart was not set / set to false, it would return directly. But if both queryBeforeStart was set to true and the Content Steering Manifest was not yet fetched, it would await that request to finish, before giving an educated answer.

I prefer that solution because it opaquely forced the right "queryBeforeStart" implementation when a CdnPrioritizer is used to order ServiceLocations - this is even nicer when considering that the CdnPrioritizer also is the class fetching and refreshing the Content Steering Manifest, meaning that forgetting to use it would also mean not relying on a Content Steering Manifest anyway.

This also means that no outside block need to understand this intricacy: only the CdnPrioritizer does, which is also one of the [very rare] blocks implementing most of the Content Steering mechanisms.

Handling of the refreshing logic

The refreshing logic of the Content Steering Manifest is also performed by the CdnPrioritizer.
The implementation is somewhat simple: after the previous Steering Manifest's TTL (in seconds), we refresh it.

There is additional logic for if a <ContentSteering> appears or disappear after a MPD update. But what to do in those case appeared relatively straightforward.

In huge parts because of this refreshing logic, I also had to implement a system of events on the CdnPrioritizer for the following events:

  • a Content Steering Manifest request/parsing operation error arised, so it can be translated into a player event through our API. This is communicated through a "warnings" event

  • More importantly, a priorityChange event has been added, for when the order of priorities between ServiceLocations changed.
    This was added to work-around a subtle but complex-enough situation where the priority between ServiceLocations changed while the player is waiting to retry requesting a segment through another now non-prioritized ServiceLocations.
    More details on the next chapter.

Request scheduling modifications

Another specificity to take into account was how the Content Steering mechanism interacts with our request scheduling logic, especially with what we call the "exponential backoff".

This concept designates the notion that we might want to wait a delay before re-attempting a request that previously failed on a server, progressively raising that delay after each consecutive unsuccessful attempt to avoid overwhelming the server.

When considering multiple server for each resource and - even more complex - when considering that the priority between those can change while a delay is awaited, properly handling this exponential backoff mechanism became a little more complex.

What I ended-up to do was to register in an object a per-CDN (monotonically raising) timestamp at which the last request was done for a particular resource, alongside the amount of attempts already done on that same CDN.
This way, exponential backoff could be applied per-CDN and even be interrupted and restarted at any time if the priority between CDN changed in the meantime. This change of priority is known of when the CdnPrioritizer sends the priorityChange event.

Moreover, CDN on which the request fails are temporarily "downgraded" - meaning moved at the end of the priority list - for a period of time equal to the Steering Manifest's TTL (as it is specified in the DASH's Content Steering spec) - or for 60 seconds if no such TTL exists.
This also automatically allows to nicely test the second most prioritary CDN when a request through the first one fails, and still allows to loop over once all CDNs are downgraded.

@peaBerberian peaBerberian added proposal This Pull Request or Issue is only a proposal for a change with the expectation of a debate on it Priority: 4 (Very low) This issue or PR has a very low priority. labels Oct 13, 2022
@lfaureyt
Copy link
Contributor

Impressive work !

Let's assume content steering priority switches from CDN A to CDN B while a segment request is pending on CDN A, and then segment request on A fails ... In that case, is the request re-started immediately on CDN B (and, if so, is the count of attempts on CDN B cleared ?) or is the request delayed according to its exponential backoff state on CDN B ?

monotically raising

I think you meant "monotonically" ;-)

@peaBerberian
Copy link
Collaborator Author

peaBerberian commented Oct 14, 2022

@lfaureyt Thanks!

Let's assume content steering priority switches from CDN A to CDN B while a segment request is pending on CDN A, and then segment request on A fails ... In that case, is the request re-started immediately on CDN B (and, if so, is the count of attempts on CDN B cleared ?) or is the request delayed according to its exponential backoff state on CDN B ?

I have to re-check/test that this is what's really going on but I would say that the count of attempts on CDN B for that segment (as exponential backoff is still per-segment) is not reset until the segment has been loaded.
Thus, if a request for the same segment failed on CDN B since less time than the calculated backoff at that time (backoff time bases itself on the last request made with the corresponding CDN, which may not be the last request), we will wait for at least (see below) the remaining backoff time before performing the request on that CDN.

Also, when you begin to enter cases where CDN referenced in a steering manifest have failed at least once for a given segment, the algorithm behind CDN choice becomes a little more complex: the remaining backoff time is taken into account first, then the steering manifest prioritization (CDN unlisted in that manifest are still not requested).
There, depending on the amount of retries for specific CDNs and precedent CDN prioritization, you can end up on a less-prioritized CDN still being requested first.

@peaBerberian peaBerberian force-pushed the feat/content_steering branch 2 times, most recently from 60a9f65 to 5923fbd Compare November 3, 2022 10:03
@peaBerberian peaBerberian force-pushed the feat/content_steering branch 3 times, most recently from 489f7a3 to 0ab12e7 Compare June 13, 2023 15:30
@peaBerberian peaBerberian changed the title Revert "Remove ContentSteering logic to just keep better Cdn prioriza… Implement ContentSteering Feb 5, 2024
@peaBerberian peaBerberian changed the base branch from next to dev February 5, 2024 17:42
@peaBerberian peaBerberian force-pushed the feat/content_steering branch 3 times, most recently from 3dea214 to 4d08be7 Compare February 5, 2024 17:51
@peaBerberian peaBerberian force-pushed the feat/content_steering branch 3 times, most recently from fa598ec to 80330a8 Compare February 5, 2024 18:12
@peaBerberian peaBerberian force-pushed the feat/content_steering branch 6 times, most recently from 528c3d7 to e46c7d4 Compare February 27, 2024 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 4 (Very low) This issue or PR has a very low priority. proposal This Pull Request or Issue is only a proposal for a change with the expectation of a debate on it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants