Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata not in head but in the body #37

Open
ThePavolC opened this issue Jan 7, 2021 · 0 comments
Open

Metadata not in head but in the body #37

ThePavolC opened this issue Jan 7, 2021 · 0 comments

Comments

@ThePavolC
Copy link

ThePavolC commented Jan 7, 2021

Hi,

I am having an issue with getting the metadata using opengraph_py3, urllib and bs4.

In parser method you are only checking the <head> but it looks like <meta> tags are sometimes in the body. Any ideas how can I fix this ? Is it due to the UserAgent ?

  • urllib3 1.23
  • opengraph-py3 0.71
  • beautifulsoup4 4.6.0
import re
import opengraph_py3 as opengraph
import urllib
from bs4 import BeautifulSoup

raw = urllib.request.FancyURLopener().open("https://youtu.be/DQwU_kU4pUg")
html = raw.read()
soap = BeautifulSoup(html, 'html.parser')

# This is the same code as in `parser`
soap.html.head.findAll(property=re.compile(r'^og'))
# []

soap.html.body.findAll(property=re.compile(r'^og'))
# [<meta content="YouTube" property="og:site_na....]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant