Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--simulate doesn't accurately simulate downloading under certain conditions #9843

Open
9 of 10 tasks
seproDev opened this issue May 2, 2024 · 5 comments · May be fixed by #9862
Open
9 of 10 tasks

--simulate doesn't accurately simulate downloading under certain conditions #9843

seproDev opened this issue May 2, 2024 · 5 comments · May be fixed by #9862
Labels
bug Bug that is not site-specific

Comments

@seproDev
Copy link
Collaborator

seproDev commented May 2, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Provide a description that is worded well enough to be understood

When running a yt-dlp command with --simulate (and without an -f arg), the default format selection differs from an unsimulated run under any of these conditions:

  • ffmpeg is not available
  • the outtmpl is -
  • the URL is for a livestream (and --live-from-start was not passed)

A dry-run/simulate option should actually simulate the behaviour that will occur when downloading.
This behaviour is currently undocumented. Either the behaviour should be changed or at the very least be documented.


Copying initial discussion: #9805 (comment)

It looks like we can trace this logic back to ytdl-org/youtube-dl@0017d9a

Back then, upstream's default format spec was only best if ffmpeg was not available. So a simulated run would result in a "requested formats not available" error if ffmpeg was not available and there was no combined video+audio format available. This simulate check seems to be added so that you could print json without having to manually pass -f bv+ba or -f bv etc in this scenario -- see the linked upstream PR

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', '--simulate', 'https://www.youtube.com/watch?v=2yJgwwDcgV8']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version master@2024.04.28.221944 from yt-dlp/yt-dlp-master-builds [ac817bc83] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.22631-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: none
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, curl_cffi-0.5.10, mutagen-1.47.0, requests-2.31.0, sqlite3-3.35.5, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets, curl_cffi
[debug] Loaded 1810 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-master-builds/releases/latest
Latest version: master@2024.04.28.221944 from yt-dlp/yt-dlp-master-builds
yt-dlp is up to date (master@2024.04.28.221944 from yt-dlp/yt-dlp-master-builds)
[youtube] Extracting URL: https://www.youtube.com/watch?v=2yJgwwDcgV8
[youtube] 2yJgwwDcgV8: Downloading webpage
[youtube] 2yJgwwDcgV8: Downloading ios player API JSON
[youtube] 2yJgwwDcgV8: Downloading android player API JSON
WARNING: [youtube] Skipping player responses from android clients (got player responses for video "aQvGIIdgFDM" instead of "2yJgwwDcgV8")
[debug] Loading youtube-nsig.7d1f7724 from cache
[debug] [youtube] Decrypted nsig ZyUwo2vdMccktm7tN0 => ZvGvrjLHlKzcbw
[debug] Loading youtube-nsig.7d1f7724 from cache
[debug] [youtube] Decrypted nsig vYdxycJ0vBBgWEBA_9 => Etq9qDUH370hPg
[youtube] 2yJgwwDcgV8: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] 2yJgwwDcgV8: Downloading 1 format(s): 244+251
@seproDev seproDev added bug Bug that is not site-specific triage Untriaged issue labels May 2, 2024
@pukkandan
Copy link
Member

cc @dirkf

@bashonly bashonly changed the title --simulate does not accurately simulate downloading if FFmpeg is missing --simulate doesn't accurately simulate downloading under certain conditions May 2, 2024
@dirkf
Copy link
Contributor

dirkf commented May 2, 2024

I'm a little hazy as to why one would want to use --simulate because all it basically tells you is that the extractor didn't (with luck) crash. If you want to know, say, what format(s) will be selected there is--get-format or eqv. Since no video download is being run, it can't tell you anything about any external downloader.

Looking at upstream confirms the diagnosis in this issue.

  1. The API param simulate is also forced to true when a "printing" option such as --get-format is selected. This would give the wrong answer if the default format selection was changed by simulate.

  2. The default format is changed to best/bestvideo+bestaudio as below:

        def prefer_best():
            if self.params.get('simulate', False):
                return False
            if not download:
                return False
            if self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-':
                return True
            if info_dict.get('is_live'):
                return True
            if not can_merge():
                return True
            return False

So actually there are several cases where the default format should be changed, and isn't, when simulate is set, or when no download is requested (normally not through the CLI). Arguably the first two tests should be moved after the tests that return True.

@bashonly
Copy link
Member

bashonly commented May 2, 2024

I'm a little hazy as to why one would want to use --simulate because all it basically tells you is that the extractor didn't (with luck) crash. If you want to know, say, what format(s) will be selected there is --get-format or eqv.

Yeah, the issue is really about the simulate param rather than just the --simulate CLI flag

@bashonly bashonly removed the triage Untriaged issue label May 3, 2024
@dirkf
Copy link
Contributor

dirkf commented May 3, 2024

Well, I think the third result (without the simulate/download tests) is correct and the second not:

$ python -m youtube_dl --get-format 'BaW_jenozKc'
248 - 1920x1080 (1080p)+140 - audio only (audio_quality_medium)
$ python -m youtube_dl --get-format -o - 'BaW_jenozKc'
248 - 1920x1080 (1080p)+140 - audio only (audio_quality_medium)
$ python -m youtube_dl --get-format -o - 'BaW_jenozKc'
22 - 1280x720 (720p)
$ 

@seproDev seproDev reopened this May 4, 2024
@seproDev seproDev linked a pull request May 5, 2024 that will close this issue
9 tasks
@pukkandan
Copy link
Member

Re: 96da952#r141681868

In case users of the API rely on the historic behaviour when download is falsy, that can be left in place, as it's always truthy in the CLI (at least it is upstream).

Imo, since we are breaking compat anyway, it's more consistent to completely get rid of this behavior. Although not the recommended approach, I have often seen extract_info(URL, download=False) to get metadata and then download(URL) in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants