Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmented Downloads via DASH/HLS have a severe performance impact #56

Open
rlaphoenix opened this issue May 9, 2023 · 1 comment
Open
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@rlaphoenix
Copy link
Member

rlaphoenix commented May 9, 2023

Describe the bug
If you download a DASH manifest or HLS playlist through Devine, and Devine could not resolve it to a single direct download URL, then the segmented download process would result in a huge CPU performance hit.

Before the download:
image

During the download:
image

These screenshots are from another user, not me. I personally have been able to confirm the performance hit, but have not been able to reproduce it to this extreme. However, I do have a much more powerful CPU.

My CPU Before the download (with various processes running in the background):

image

My CPU During the download:

image

Both Windows and Linux users have reported CPU usage problems. I have not yet had any reports from Mac OS but it's likely to be a problem there too.

To Reproduce
Steps to reproduce the behavior:

  1. Download a title using Devine that is segmented DASH or HLS.
  2. You will know if it's segmented DASH or HLS if the download speed states DASH or HLS next to the speed.
  3. Take a look at your performance in Task Manager or top/htop on Linux.

Expected behavior
Some extra CPU usage is generally expected since calls to shaka-packager will be made. However, CPU usage to this
extreme is generally not expected.

Additional context
The environment in which Devine is used/called does not seem to affect the CPU usage. Therefore, what terminal or such someone uses does not seem to directly relate to this performance issue.

I've been trying to debug this for a few months now and have not personally found a direct correlation. Even though I've explicitly done some tests, the shaka-packager subprocesses as well as the aria2c subprocesses do not seem to be the cause, not directly. It may be Python itself having to use a lot of CPU when creating/opening these processes.

The only thing that can be confirmed, is performance was only an issue after moving to the dynamic per-segment download+decrypt system that allowed support for some types of encryption scenarios. I.e., Unique AES clear-key on every segment of a HLS playlist. Therefore, going back to the old system of downloading and merging segments, assuming first DRM info applies to every segment, assuming every segment is encrypted, and decrypting, will not work going forward.

When adding a sleep() to the start of the download track thread function, I noticed an initial spike of CPU usage, then nothing more. This spike is up towards the previous screenshots. This leads me to believe that the actual subprocess or threading is causing the performance hit, at least at first, and not the actual process itself (shaka/aria2c).

image

Another interesting thing I noticed during one download was on an HLS playlist with Video and Subtitle tracks. The Video track had the audio track muxed in as well. It download the Video (and therefore Audio) with almost no performance hit, and it even had a unique AES clear-key per segment. Yet once that finished and it went into doing Subtitles, it went to 90% CPU, almost fully using it up. It was doing the Video track segments slowly (relative to the subtitle). The Video track was downloading with about 4x aria2c processes at a time (for some reason), while the Subtitle track was downloading with all 16x aria2c processes at a time. This leads me to again believe that Python having to start up a ton of subprocesses in a short time causes a spike in CPU usage. Task Manager would read the aria2c processes at 0%~ CPU usage as well, so it's a bit strange.

image

Another possibility could be the dreaded Windows Defender real-time protection cutting in performance. I did notice that when I disabled everything real-time in Defender, the CPU usage dropped from the typical high amounts we've seen so far, to about half of that. That could have just been a fluke though, and half the highs we see, is still higher than I'd like.

@rlaphoenix rlaphoenix added bug Something isn't working help wanted Extra attention is needed labels May 9, 2023
@rlaphoenix
Copy link
Member Author

One idea here is to keep processing per-segment DRM information like retrieving the key after downloading the segment, but store that DRM information mapped to the segment. Then after all segments are downloaded, then run a loop for decrypting each segment. This would reduce the number of subprocesses during the download by up to half until the download is finished.

This would reduce subprocess creation and reduce CPU usage, but would also generally reduce CPU usage since shaka wouldn't be running until post-download. But overall per-title the same amount of subprocesses would be created, just less at the same time, more spanned out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant