Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to fix ytsearch:, here's where I've got to #210

Open
cloudrac3r opened this issue Oct 23, 2020 · 2 comments
Open

Trying to fix ytsearch:, here's where I've got to #210

cloudrac3r opened this issue Oct 23, 2020 · 2 comments

Comments

@cloudrac3r
Copy link

Test command: youtube-dlc ytsearch:lol --flat-playlist -J --verbose

For searches, youtube-dl/c tries to download some representation of the search page encoded as JSON which contains HTML strings, visible around youtube.py:3289:

data = self._download_json( ...
html_content = data[1]['body']['content']

However when this code is executed the _download_json line fails because it tried to parse HTML as JSON. This is because the query parameter that youtube-dl/c was using, spf=navigate, is now ignored by YouTube, so YouTube just returns an ordinary page of results.

There may now be a different query parameter that gets the results in the same format, but if there is, I don't know what it is.

Otherwise we'll have to request the data from YouTube in a different format. Here's what I've got to on that:

post_data = {
    'context': {
        'client': {
            'clientName': 'WEB',
            'clientVersion': '2.20201022.01.01',
        }
    },
    'query': query # the search query goes here
}
result_url = 'https://www.youtube.com/youtubei/v1/search?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' # this key is the same globally

and add these parameters to _download_json:

data=json.dumps(post_data).encode('utf-8'),
headers={
    'content-type': 'application/json'
}

Now you have a completely JSON representation of the results, which you can step into with:

data.contents.twoColumnSearchResultsRenderer.primaryContents.sectionListRenderer.contents[1].itemSectionRenderer.contents

Depending on the search terms, sometimes the 1 index is a 0.

I don't have the energy to continue arranging the data into a format that the rest of the code likes. Hopefully someone can pick up from my work.

Peace.

@blackjack4494
Copy link
Owner

blackjack4494 commented Oct 23, 2020

Youtube is just rolling out updates in a way that not everyone will instantly use the new version and that they update not all components at once as it's the case with search now. They use the continuation method here now as well as they do with most feeds already.
Prior to that youtube/dl used to make use of some in page embedded buttons/links.
The new way is actually much easier and cleaner.

Just look out for this

{"continuationItemRenderer":
{"trigger":"CONTINUATION_TRIGGER_ON_ITEM_SHOWN","continuationEndpoint":
{"clickTrackingParams":"CBwQui8iEwjmxNfj58rsAhXJgt4KHf0cDIc=","commandMetadata":
{"webCommandMetadata":
{"url":"/service_ajax","sendPost":true,"apiUrl":"/youtubei/v1/search"}},
"continuationCommand":{"token":"Eps......","request":"CONTINUATION_REQUEST_TYPE_SEARCH"}}}}

Especially the "continuationCommand":{"token":
You will almost always get this token on almost every page and have to use this with the youtubei/v1 api

For search you would use
https://www.youtube.com/youtubei/v1/search?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8
and some Request payload which has the following

{"context":<DICT>,
"continuation":<KEY>}

Where the key is the token in continuationCommand

The Response is json. That will have another (new) token in each request until there are no more results.
Keep in mind that you should use some rate limiter (e.g. 2-3 seconds timeout between each request) since too many requests will lead to youtube returning you errors.

It's quite an easy fix to implement. However I am missing the time currently to do so.
I got some fix related to feeds like watch history that uses the exact same implementation. But that is still a local branch and not uploaded on github as of yet. I may have more time the next days or week to finally polish those fixes and incorporate them.

There is a screenshot of history feed progress on Gitter (link)

@blackjack4494
Copy link
Owner

Seems youtube-dl just implemented this fix nice. Less work for me then :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants