You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These results are as expected: if you run this selector on an element that is not the root, the results will be limited to it and its descendants with the default prefix.
Everything else is in lxml and libxml2, not cssselect. What matters for XPath is not whether the original HTML source is valid, but what the parsed tree looks like. Here you’re using libxml2’s HTML parser, which gives you a tree in a weird state.
>>> d
<Element html at 0x7f8bcc0e5b90>
>>> list(d)
[]
>>> d.getparent() is None
True
>>> d.getnext()
<Element html at 0x7f8bcc110350>
>>> list(d.getnext())
[<Element body at 0x7f8bcc110050>]
d is the root element of the tree (it has no parent, as expected), but it also has a sibling! (Very much unexpected.) This looks like a bug in libxml2’s parser.
In the meantime, try using the html5lib parser instead:
d = html5lib.parse('''
<!DOCTYPE html>
<html>
<body></body>
''', treebuilder='lxml', namespaceHTMLElements=False).getroot()
(I’m disabling namespaces here because of cssselect bug #9.)
Since this example has invalid HTML, feel free to ignore this issue.
Anyway, here it is (simplified from http://www.weheart.co.uk/2013/02/18/alley-oop-design-exhibition/):
Just a bit unexpected that the first XPath query doesn't find anything.
The text was updated successfully, but these errors were encountered: