Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloadling audio files (m4a, mp3) from google drive #8281

Closed
9 tasks done
saeedesmaili opened this issue Oct 4, 2023 · 7 comments · Fixed by #9908
Closed
9 tasks done

Downloadling audio files (m4a, mp3) from google drive #8281

saeedesmaili opened this issue Oct 4, 2023 · 7 comments · Fixed by #9908
Labels
needs-testing Patch needs testing patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website

Comments

@saeedesmaili
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Please make sure the question is worded well enough to be understood

I'm using the following python code to download audio files (with m4a and mp3 formats) from google drive:

FILE_URL = "https://drive.google.com/file/d/1Il5LrGlp_iWFNhtgrD449tDUeFuuBWMh/view?usp=sharing"
ydl_opts = {
    'format': 'bestaudio/best',
    'postprocessors': [{  
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'm4a',
    }],
    'quiet': False,
    'verbose': True,
    'outtmpl': '/tmp/%(id)s.%(ext)s',
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    error_code = ydl.download(FILE_URL)

Strangely, when I try a link for the first time, they are downloaded without any issues, but when I try to download the same file using the same URL again (e.g. when I'm changing other parts of my code and I'm in the process of developing my script) I can't download the files and I get the following error (for m4a file):

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2023.09.24 [088add9] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'm4a'}], 'quiet': False, 'verbose': True, 'outtmpl': '/tmp/%(id)s.%(ext)s', 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.9.17 (CPython arm64 64bit) - macOS-13.5.1-arm64-arm-64bit (OpenSSL 3.1.2 1 Aug 2023)
[debug] exe versions: ffmpeg 6.0 (setts), ffprobe 6.0
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2021.10.08, mutagen-1.47.0, sqlite3-3.42.0, websockets-11.0.3
[debug] Proxy map: {}
[debug] Loaded 1886 extractors

[GoogleDrive] Extracting URL: https://drive.google.com/file/d/1Il5LrGlp_iWFNhtgrD449tDUeFuuBWMh/view?usp=sharing
[GoogleDrive] 1Il5LrGlp_iWFNhtgrD449tDUeFuuBWMh: Downloading video webpage

ERROR: 1Il5LrGlp_iWFNhtgrD449tDUeFuuBWMh: An extractor error has occurred. (caused by KeyError('50')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
File "/opt/homebrew/lib/python3.9/site-packages/yt_dlp/extractor/common.py", line 715, in extract
ie_result = self._real_extract(url)
File "/opt/homebrew/lib/python3.9/site-packages/yt_dlp/extractor/googledrive.py", line 197, in _real_extract
'ext': self._FORMATS_EXT[format_id],
KeyError: '50'

And almost the same error with KeyError: '140' for mp3 files. I looked into the google drive extractor part of the code, and the answer to my question probably is "yt-dlp doesn't support downloading audio files from google drive", but since it is able to download them sometimes, I thought I'm missing something probably.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

(pasted in the question above)
@saeedesmaili saeedesmaili added the question Question label Oct 4, 2023
@bashonly bashonly added site-bug Issue with a specific website and removed question Question labels Oct 4, 2023
@bashonly
Copy link
Member

bashonly commented Oct 4, 2023

seems like we just need to add IDs for audio file extensions, something like this:

diff --git a/yt_dlp/extractor/googledrive.py b/yt_dlp/extractor/googledrive.py
index 2fdec20f6..b51a2fa1a 100644
--- a/yt_dlp/extractor/googledrive.py
+++ b/yt_dlp/extractor/googledrive.py
@@ -70,7 +70,9 @@ class GoogleDriveIE(InfoExtractor):
         '44': 'webm',
         '45': 'webm',
         '46': 'webm',
+        '50': 'm4a',
         '59': 'mp4',
+        '140': 'mp3',
     }
     _BASE_URL_CAPTIONS = 'https://drive.google.com/timedtext'
     _CAPTIONS_ENTRY_TAG = {

@bashonly bashonly added patch-available There is patch available that should fix this issue. Someone needs to make a PR with it needs-testing Patch needs testing labels Oct 4, 2023
@saeedesmaili
Copy link
Author

Yeah exactly. I was wondering if this is an intentional choice (for some technical reasons) to not allow downloading audio files from google drive or not.

@gamer191
Copy link
Collaborator

gamer191 commented Oct 5, 2023

IMO yt-dlp shouldn't fail completely if a format extension isn't known

I'm not that familiar with the gdrive extractor though

@Nephiel
Copy link

Nephiel commented Oct 10, 2023

FWIW I was unable to download a specific video file from Drive until I patched googledrive.py with these IDs:

diff --git a/yt_dlp/extractor/googledrive.py b/yt_dlp/extractor/googledrive.py
index 2fdec20f6..4bb96f22e 100644
--- a/yt_dlp/extractor/googledrive.py
+++ b/yt_dlp/extractor/googledrive.py
@@ -71,6 +71,9 @@ class GoogleDriveIE(InfoExtractor):
         '45': 'webm',
         '46': 'webm',
         '59': 'mp4',
+        '134': 'mp4',
+        '136': 'mp4',
+        '140': 'mp4',
     }
     _BASE_URL_CAPTIONS = 'https://drive.google.com/timedtext'
     _CAPTIONS_ENTRY_TAG = {

Not sure where the 134 came from but it was the first KeyError reported by yt-dlp stable@2023.10.07. According to the "stats for nerds" option in the Google Drive embedded video player, the video format was avc1 (136) with audio mp4a (140).

@webafrancois
Copy link

I've got the same error with an m4a error :

# ./yt-dlp_linux --version
2024.04.09.170053

I anonymize the URL because it's note mine.

./yt-dlp_linux -v "https://drive.google.com/file/d/1P3ZwdrIDp7z6g8cAd4NgkdXXX/view"
[debug] Command-line config: ['-v', 'https://drive.google.com/file/d/1P3ZwdrIDp7z6g8cAd4NgkdlvdXXX/view']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version master@2024.04.09.170053 from yt-dlp/yt-dlp-master-builds [ff0779267] (linux_exe)
[debug] Python 3.10.14 (CPython x86_64 64bit) - Linux-5.15.0-101-generic-x86_64-with-glibc2.35 (OpenSSL 3.2.1 30 Jan 2024, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, secretstorage-3.3.3, sqlite3-3.45.2, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1810 extractors
[GoogleDrive] Extracting URL: https://drive.google.com/file/d/1P3ZwdrIDp7z6g8cAd4NgkdlvdXXX/view
[GoogleDrive] 1P3ZwdrIDp7z6g8cAd4Ngkdlvd5ZTT8b5: Downloading video webpage
ERROR: 1P3ZwdrIDp7z6g8cAd4NgkdlvdXXX: An extractor error has occurred. (caused by KeyError('50')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "yt_dlp/extractor/common.py", line 734, in extract
  File "yt_dlp/extractor/googledrive.py", line 200, in _real_extract
KeyError: '50'
$ file XXX.m4a 
XXX.m4a: ISO Media, Apple iTunes ALAC/AAC-LC (.M4A) Audio

@pukkandan
Copy link
Member

None of the posted URLs work anymore. Someone needs to provide working URLs for this to be fixed

@webafrancois
Copy link

Very strange.
I upload a personal file on a drive, and it works fine :

/yt-dlp_linux -v https://drive.google.com/file/d/1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9/view
[debug] Command-line config: ['-v', 'https://drive.google.com/file/d/1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9/view']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version master@2024.04.09.170053 from yt-dlp/yt-dlp-master-builds [ff0779267] (linux_exe)
[debug] Python 3.10.14 (CPython x86_64 64bit) - Linux-5.15.0-101-generic-x86_64-with-glibc2.35 (OpenSSL 3.2.1 30 Jan 2024, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, secretstorage-3.3.3, sqlite3-3.45.2, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1810 extractors
[GoogleDrive] Extracting URL: https://drive.google.com/file/d/1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9/view
[GoogleDrive] 1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9: Downloading video webpage
[GoogleDrive] 1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9: Requesting source file
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] 1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9: Downloading 1 format(s): source
[debug] Invoking http downloader on "https://drive.usercontent.google.com/download?id=1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9&export=download&confirm=t"
[download] Destination: mix_audio.m4a [1ShMdrxXiRyzKHnkvQtTpC9POqWxdoZM9].m4a
[download] 100% of   14.65MiB in 00:00:01 at 9.81MiB/s

I cannot send the real URL because the file is not mine, and contents personal data.

WyohKnott added a commit to WyohKnott/yt-dlp that referenced this issue May 11, 2024
We import those from the Youtube extractor, and also add:

    '50':'mp3',

for mp3 files scrapping.

Fix: yt-dlp#8281
WyohKnott added a commit to WyohKnott/yt-dlp that referenced this issue May 11, 2024
We import those from the Youtube extractor, and also add:

    '50':'mp3',

for mp3 files scrapping.

Fix: yt-dlp#8281
WyohKnott added a commit to WyohKnott/yt-dlp that referenced this issue May 11, 2024
We import those from the Youtube extractor, and also add:

    '50':'mp3',

for mp3 files scrapping.

Fix: yt-dlp#8281
WyohKnott added a commit to WyohKnott/yt-dlp that referenced this issue May 11, 2024
We import those from the Youtube extractor, and also add:

    '50':'mp3',

for mp3 files scrapping.

Fix: yt-dlp#8281
bashonly pushed a commit that referenced this issue May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-testing Patch needs testing patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants