Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xpath with namespace and position #353

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

knit-bee
Copy link

I noticed that it is not possible to use elem.find or elem.findall with an xpath that contains position indices if the method is called with the namespaces argument.
This behavior has also been reported in Bug #1873886.

It appears that during the tokenization of the xpath, the numbers are treated as tags, i.e. they are concatenated with the default namespace (during function calls with namespaces). This results in a wrong path imo.
For example:

>>> from lxml import etree
>>> doc = etree.XML("""
      <foo xmlns="http://example.com/foo">
        <bar>baz</bar>
      </foo>""")
>>> path = "./bar[1]"
>>> doc.find(path, namespaces={None:"http://example.com/foo"})
None

The target element is not found here because the path that is used is effectively:
./{http://example.com/foo}bar[{http://example.com/foo}1]

Changes:

  • I added a check during the tokenization of the xpath to determine whether the processed tag is a number to avoid concatenation with the namespace.

Copy link
Member

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think we should reduce the amount of code that we add here, though.

Comment on lines +377 to +380
self.assertEqual(
summarize_list(elem.findall("tag", namespaces=namespaces)),
["{nsnone}tag", "{nsnone}tag"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how many examples you add here, this seems worth its own custom assert method: .assertFindallEqual(element, path, expected, namespaces=None).

Also, do we actually need to add these tests? ISTM that we could get away with running the existing tests three times, once without namespaces dict, once with an empty one, and once with a non-empty one.

src/lxml/_elementpath.py Outdated Show resolved Hide resolved
@knit-bee
Copy link
Author

Thank you for the review, I will try to include your suggestions next week.

src/lxml/_elementpath.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants