Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(offline): improve the speed of offline downloads #4168

Merged
merged 3 commits into from Apr 28, 2022

Conversation

avelad
Copy link
Collaborator

@avelad avelad commented Apr 27, 2022

Add the possibility to configure the number of downloads in parallel per stream. If you put a higher value the download time can decrease.

Related to #4166

@avelad
Copy link
Collaborator Author

avelad commented Apr 27, 2022

Examples in Demo app:

Angel One (multicodec, multilingual)

  • numberOfParallelSegmentsByStream=1 --> 8480ms
  • numberOfParallelSegmentsByStream=5 --> 7339ms (-13%)

Sintel 4k (multicodec)

  • numberOfParallelSegmentsByStream=1 --> 90921ms
  • numberOfParallelSegmentsByStream=5 --> 71798ms (-21%)

Sintel w/ trick mode (MP4 only, 720p)

  • numberOfParallelSegmentsByStream=1 --> 90448ms
  • numberOfParallelSegmentsByStream=5 --> 79293ms (-12%)

Tears of Steel (multicodec, TTML)

  • numberOfParallelSegmentsByStream=1 --> 79831ms
  • numberOfParallelSegmentsByStream=5 --> 68895ms (-13%)

@avelad avelad requested a review from joeyparrish April 27, 2022 07:51
const numberOfParallelSegmentsByStream =
config.offline.numberOfParallelSegmentsByStream;
if (numberOfParallelSegmentsByStream > 1) {
// Note: 1000 is a random number to not coincide with other groups
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's not random, it's arbitrary. And you should note that 1000 is safe so long as there are fewer than 1000 actual streams.

Comment on lines 1343 to 1344
groupId += 1000 + Math.floor(Math.random() *
numberOfParallelSegmentsByStream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to have, say, 5 parallel tracks of audio and 5 of video, this won't do that. For example:

video stream, id 1 => 1001 - 1005
audio stream, id 2 => 1002 - 1006
audio stream, id 3 => 1003 - 1007
text stream, id 4 => 1004 - 1008

So there will be mixing between groups.

However, why not mix between groups? Why not assign each segment to a random group in the range of the config variable?

const groupId = Math.floor(
    Math.random() * numberOfParallelSegmentsByStream);

Now that I'm looking at the problem this way, it may be better to change the config from numberOfParallelSegmentsByStream to numberOfParallelDownloads. It's not "by stream" any more, nor did it ever really need to be.

The consistency of your results would also improve if:

  1. groupIds were assigned in a round-robin manner instead of purely random
  2. the default were set to something like 4 or 5 instead of 1

I would then call this a "feat" instead of a "fix", since the default would be different than the old behavior. But feat vs fix here is somewhat subjective. I wouldn't insist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@joeyparrish
Copy link
Member

This is a wonderfully simple solution, but I wonder if we can do better than ~20% by coalescing requests. The ramp-up time for a TCP stream can be significant, and requesting segments over and over can make it difficult to achieve the ideal throughput.

That said, I would still take this fix as an intermediate solution, as it's very cheap, and ~20% is a definite improvement.

@avelad avelad changed the title fix(offline): offline downloads are slow feat(offline): improve the speed of offline downloads Apr 28, 2022
@avelad
Copy link
Collaborator Author

avelad commented Apr 28, 2022

@joeyparrish I made multiple tests, and definitely what makes the download slow is https://github.com/shaka-project/shaka-player/blob/main/lib/offline/storage.js#L471

@joeyparrish
Copy link
Member

@joeyparrish I made multiple tests, and definitely what makes the download slow is https://github.com/shaka-project/shaka-player/blob/main/lib/offline/storage.js#L471

Ah, and that's called serially after each segment download, right? So we spend some of our "download time" actually waiting on database updates, so we will never utilize all available bandwidth.

Looks like this could use a redesign. I wish we had done some performance testing on it when this design was first proposed/implemented.

The theory was that to enable us to resume downloading if an app was closed, we would need to track each segment in the DB as it was downloaded. I don't recall exactly if that resume call was ever implemented, though. So maybe this is a waste.

Should we continue with this PR to parallelize, or try to address the slow DB update instead? (Or both?)

Is it slow because of calls like this loading a lot of data?

const manifests = await activeHandle.cell.getManifests([manifestId]);

Or this big double-loop?

for (const stream of manifestDB.streams) {

externs/shaka/player.js Outdated Show resolved Hide resolved
lib/offline/storage.js Outdated Show resolved Hide resolved
@joeyparrish
Copy link
Member

Should we continue with this PR to parallelize, or try to address the slow DB update instead? (Or both?)

Never mind that. I forgot that we had already discussed it and decided that this was an easy win. So let's finish this PR and then worry about the DB afterward.

Thank you for your hard work and analysis on this!

@joeyparrish
Copy link
Member

I made two small edits so that we could merge this PR now. I hope you don't mind.

I'll wait for one more test pass, since I may have made a typo, and then I'll merge if it looks good. Thanks!

@theodab
Copy link
Collaborator

theodab commented Apr 28, 2022

@joeyparrish I made multiple tests, and definitely what makes the download slow is https://github.com/shaka-project/shaka-player/blob/main/lib/offline/storage.js#L471

Ah, and that's called serially after each segment download, right? So we spend some of our "download time" actually waiting on database updates, so we will never utilize all available bandwidth.

Looks like this could use a redesign. I wish we had done some performance testing on it when this design was first proposed/implemented.

The theory was that to enable us to resume downloading if an app was closed, we would need to track each segment in the DB as it was downloaded. I don't recall exactly if that resume call was ever implemented, though. So maybe this is a waste.

That was specifically part of the changes that were going to make background fetch possible. But that's looking to have gone way down in priority, so maybe undoing the changes would be worthwhile.

@joeyparrish
Copy link
Member

That was specifically part of the changes that were going to make background fetch possible. But that's looking to have gone way down in priority, so maybe undoing the changes would be worthwhile.

Right! Thanks for the reminder!

@joeyparrish joeyparrish merged commit 73f6de3 into shaka-project:main Apr 28, 2022
@avelad avelad deleted the parallel-downloads branch April 29, 2022 05:30
@avelad avelad added this to the v4.0 milestone May 4, 2022
@github-actions github-actions bot added the status: archived Archived and locked; will not be updated label Jul 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: archived Archived and locked; will not be updated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants