Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix index out of range when <body> or <head> is missing #272

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FVolral
Copy link

@FVolral FVolral commented Sep 2, 2018

If a HTML file that doesn't contain a or part is parsed,
it can lead to a index out of range error when you try
getattr(lxmlElement, "body")

If a HTML file that doesn't contain <body> is parsed, it can lead
to a index out of range error when you try getattr(lxmlElement, "body")
for example.
Copy link
Member

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally a good idea, but the implementation needs fixing. Would have been obvious with some tests (hint, hint).

return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0]
result = self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})
if len(result) > 1:
return result[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise … do what? Return None? Would be good to say so if that's what you want.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've some difficulty to run lxml tests, it's a bit new for me, and that pull request is a kind of experience. It should return None indeed, I can correct that.


@property
def head(self):
"""
Returns the <head> element. Can be called from a child
element to get the document's head.
"""
return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0]
result = self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})
if len(result) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why > 1 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"> 0", because I'm stupid :p

@scoder
Copy link
Member

scoder commented Sep 2, 2018

No problem. To start the tests, it should be enough to run python test.py. The relevant ones are in src/lxml/html/tests/test_basic.txt. Doctests.

@lxml lxml deleted a comment from charshack Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants