Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There should be an option to use html5lib instead of lxml.html in DjangoClient (chokes on some html5 input) #441

Open
frankier opened this issue Oct 17, 2015 · 2 comments

Comments

@frankier
Copy link

It looks like libxml2's html parsing doesn't produce a proper html5 DOM and sometimes chokes on valid html5 even when run in tolerant mode which can result in errors like "XMLSyntaxError: ... Tag footer invalid". The solution is probably to allow the usage of html5lib instead. One hitch with this is the methods from HTMLMixin no longer exist, so the dependence on these should be removed from Splinter.

@andrewsmedina
Copy link
Member

+1 to use html5lib

@adamlwgriffiths
Copy link

I've found lxml2's html parser to be unable to handle any real-world HTML.
However, I found html5lib has a habit of closing parent tags off early, causing the children become siblings.
I personally found the inbuilt Python parser superior to html5lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants