Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

css选择器无法选择h3下的p标签 #203

Open
xujiang1 opened this issue Oct 26, 2020 · 3 comments
Open

css选择器无法选择h3下的p标签 #203

xujiang1 opened this issue Oct 26, 2020 · 3 comments
Labels

Comments

@xujiang1
Copy link

xujiang1 commented Oct 26, 2020

from parsel import Selector

html = "<h3>吉林大学社会科学学报<p>Jilin University Journal Social Sciences Edition</p></h3>"

sel = Selector(html)

print(sel.css("h3"))

print(sel.css("h3 > p::text").getall())

当我使用css选择器时 无法获取h3下的p标签,结果如下:


[<Selector xpath='descendant-or-self::h3' data='<h3>吉林大学社会科学学报</h3>'>]

[]

当我将p标签换成其他标签时可以正常获取:

from parsel import Selector

html = "<h3>吉林大学社会科学学报<em>Jilin University Journal Social Sciences Edition</em></h3>"

sel = Selector(html)

print(sel.css("h3"))

print(sel.css("h3 > em::text").getall())

结果:

[<Selector xpath='descendant-or-self::h3' data='<h3>吉林大学社会科学学报<em>Jilin University Jo...'>]

['Jilin University Journal Social Sciences Edition']
@Gallaecio
Copy link
Member

Gallaecio commented Oct 30, 2020

Interesting. I’m marking it as a bug, although I am not 100% it is one, and even if it is, it is probably an upstream issue from lxml.

@Gallaecio Gallaecio added the bug label Oct 30, 2020
@felipeboffnunes
Copy link
Member

felipeboffnunes commented Nov 1, 2022

@Gallaecio
Copy link
Member

I don’t think Parsel intends to require that input HTML is standard-compliant. Ideally, anything that a browser accepts we should accept as well, because HTML documents in the wild care about browser support more than they care about standard compliance.

Browsers seem to accept this syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants