Xpath with namespace and position #353

knit-bee · 2022-10-13T08:39:00Z

I noticed that it is not possible to use elem.find or elem.findall with an xpath that contains position indices if the method is called with the namespaces argument.
This behavior has also been reported in Bug #1873886.

It appears that during the tokenization of the xpath, the numbers are treated as tags, i.e. they are concatenated with the default namespace (during function calls with namespaces). This results in a wrong path imo.
For example:

>>> from lxml import etree
>>> doc = etree.XML("""
      <foo xmlns="http://example.com/foo">
        <bar>baz</bar>
      </foo>""")
>>> path = "./bar[1]"
>>> doc.find(path, namespaces={None:"http://example.com/foo"})
None

The target element is not found here because the path that is used is effectively:
./{http://example.com/foo}bar[{http://example.com/foo}1]

Changes:

I added a check during the tokenization of the xpath to determine whether the processed tag is a number to avoid concatenation with the namespace.

scoder

Thanks. I think we should reduce the amount of code that we add here, though.

scoder · 2022-12-01T08:17:00Z

src/lxml/tests/test_elementpath.py

+        self.assertEqual(
+            summarize_list(elem.findall("tag", namespaces=namespaces)),
+            ["{nsnone}tag", "{nsnone}tag"],
+        )


Given how many examples you add here, this seems worth its own custom assert method: .assertFindallEqual(element, path, expected, namespaces=None).

Also, do we actually need to add these tests? ISTM that we could get away with running the existing tests three times, once without namespaces dict, once with an empty one, and once with a non-empty one.

src/lxml/_elementpath.py

knit-bee · 2022-12-13T08:46:57Z

Thank you for the review, I will try to include your suggestions next week.

src/lxml/_elementpath.py

Luise Koehler added 2 commits October 13, 2022 09:34

Avoid adding namespace to position during tokenization

8f34425

Add more tests for find and findall with position index and namespace

6c10a8c

scoder reviewed Dec 1, 2022

View reviewed changes

scoder reviewed Jan 2, 2024

View reviewed changes

src/lxml/_elementpath.py Outdated Show resolved Hide resolved

Simplify implementation

71c8107

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xpath with namespace and position #353

Xpath with namespace and position #353

knit-bee commented Oct 13, 2022

scoder left a comment

scoder Dec 1, 2022

knit-bee commented Dec 13, 2022

Xpath with namespace and position #353

Are you sure you want to change the base?

Xpath with namespace and position #353

Conversation

knit-bee commented Oct 13, 2022

scoder left a comment

Choose a reason for hiding this comment

scoder Dec 1, 2022

Choose a reason for hiding this comment

knit-bee commented Dec 13, 2022