New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix index out of range when <body> or <head> is missing #272
base: master
Are you sure you want to change the base?
Conversation
If a HTML file that doesn't contain <body> is parsed, it can lead to a index out of range error when you try getattr(lxmlElement, "body") for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally a good idea, but the implementation needs fixing. Would have been obvious with some tests (hint, hint).
return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0] | ||
result = self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE}) | ||
if len(result) > 1: | ||
return result[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise … do what? Return None? Would be good to say so if that's what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've some difficulty to run lxml tests, it's a bit new for me, and that pull request is a kind of experience. It should return None indeed, I can correct that.
|
||
@property | ||
def head(self): | ||
""" | ||
Returns the <head> element. Can be called from a child | ||
element to get the document's head. | ||
""" | ||
return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0] | ||
result = self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE}) | ||
if len(result) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why > 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"> 0", because I'm stupid :p
No problem. To start the tests, it should be enough to run |
If a HTML file that doesn't contain a or part is parsed,
it can lead to a index out of range error when you try
getattr(lxmlElement, "body")