Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'str' object has no attribute '__name__' error on some xpath filters #2318

Open
dgtlmoon opened this issue Apr 18, 2024 · 17 comments · May be fixed by #2351
Open

'str' object has no attribute '__name__' error on some xpath filters #2318

dgtlmoon opened this issue Apr 18, 2024 · 17 comments · May be fixed by #2351
Assignees
Labels

Comments

@dgtlmoon
Copy link
Owner

dgtlmoon commented Apr 18, 2024

All versions?

using this shared watch https://changedetection.io/share/QtZ-94DW41sa

'str' object has no attribute '__name__' error.. i tried different lxml library versions but that made no difference

https://www.depinte.be/werken and

//div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]

seems to come from here

r = elementpath.select(tree, xpath_filter.strip(), namespaces={'re': 'http://exslt.org/regular-expressions'}, parser=XPath3Parser)

Likely it is elementpath related

@dgtlmoon
Copy link
Owner Author

tried latest elementpath 4.4.0 same result

@Constantin1489

This comment was marked as resolved.

@dgtlmoon
Copy link
Owner Author

the error comes from elementpath.. tried different versions, same outcome...

@Constantin1489

This comment was marked as resolved.

@dgtlmoon
Copy link
Owner Author

this is my custom 45.13 container's pip package version.

are you saying you cant reproduce the issue?

@Constantin1489
Copy link
Contributor

I can reproduce the problem. But it is quite weird.
With "Playwright Chromium/Javascript via 'ws://127.0.0.1:3000/?stealth=1&--disable-web-security=true'", elementpath works
With "Basic fast Plaintext/HTTP Client", 'str' object has no attribute '__name__'

image

?????

@dgtlmoon
Copy link
Owner Author

You need to compare the HTML then both in the chrome JS rendered version and using curl

@Constantin1489
Copy link
Contributor

Constantin1489 commented Apr 29, 2024

Hi,
I believe the bug is originated from libxml2. See also, https://gitlab.gnome.org/GNOME/libxml2/-/issues/716

@Constantin1489
Copy link
Contributor

I found the solution but I need time to ensure.

@ezalenski
Copy link

ezalenski commented May 7, 2024

I took a look at this just to try and brush up on my pdb skills.

The issue here is that lxml believes the html from that site is invalid. There's an issue with elementpath.select() assuming it's on a non-empty tree and not handling that correctly (this is where the exception is coming from). I think an improvement changedetection.io can do here is to check the parser.error_log for errors, maybe only with empty trees as I'm not sure how noisy that error_log is and how often it's non-empty.

image

Here's where I attached the pdb:
image

@Constantin1489
Copy link
Contributor

Constantin1489 commented May 7, 2024

@ezalenski try with python -m pdb -c 'b elementpath/tree_builders.py:229' and p [ e for e in elem.itersiblings()] in pdb. That is the problem. and see also https://gitlab.gnome.org/GNOME/libxml2/-/issues/716

Also, please take a look at my test in the PR.

@amirt01
Copy link

amirt01 commented May 13, 2024

I encountered the same issue. I'm solving it temporarily using XPath1.0 by prepending xpath1: to the XPath rule.

@Constantin1489
Copy link
Contributor

Hi @amirt01 If you provide the example URL, I would be thankful!

@amirt01
Copy link

amirt01 commented May 13, 2024

Certainly @Constantin1489! I use changedetection.io to monitor company job sites like those hosted on Lever. I ran into this issue when filtering for the posting names: //*[contains(@data-qa, 'posting-name')]. I was able to remedy this by changing this filter to: xpath1://*[contains(@data-qa, 'posting-name')].

Here is an arbitrary example using Kinsta:
Here is a link to the broken watch config.
Here is a link to the fixed* watch config.

image

@Constantin1489
Copy link
Contributor

Constantin1489 commented May 13, 2024

@amirt01 Thank you! The case you reported will be fixed with the #2351

Screenshot 2024-05-13 at 15 35 59 Screenshot 2024-05-13 at 15 35 53 Screenshot 2024-05-13 at 15 35 36 image

@leiless
Copy link

leiless commented May 20, 2024

I also came across this issue, it's reproducible in my machine.
ChangeDetection version is v0.45.22

2024-05-20 182718

The CSS/JSONPath/JQ/XPath Filters is something like //*[@id="Foobar"]/div[1].

I'm solving it temporarily using XPath1.0 by prepending xpath1: to the XPath rule, just as what @amirt01 did.
So it's something like xpath1://*[@id="Foobar"]/div[1]

@Constantin1489
Copy link
Contributor

Constantin1489 commented May 26, 2024

@leiless would you run the code by modifying the url?

URL='https://jobs.lever.co/kinsta/'
curl $URL | xmllint --html - --debug 2> /dev/null | grep 'ELEMENT html'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants