Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix parse html RecursionError #486

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

521xueweihan
Copy link

@521xueweihan 521xueweihan commented Oct 20, 2021

fix parse html

https://db-engines.com/en/ranking

RecursionError

fix parse html RecursionError
@surister
Copy link
Contributor

surister commented Feb 26, 2023

Reproduce:

Python 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> p = session.get('https://db-engines.com/en/ranking')
>>> p.html.text
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 33, in fromstring
    return _parse(data, beautifulsoup, makeelement, **bsargs)
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 79, in _parse
    root = _convert_tree(tree, makeelement)
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 152, in _convert_tree
    res_root = convert_node(html_root)
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 216, in convert_node
    return handler(bs_node, parent)
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag
    handler(child, res)
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag
    handler(child, res)
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag
    handler(child, res)
  [Previous line repeated 985 more times]
  File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 242, in convert_tag
    res = etree.SubElement(parent, bs_node.name, attrib=attribs)
  File "src/lxml/etree.pyx", line 3156, in lxml.etree.SubElement
  File "src/lxml/apihelpers.pxi", line 199, in lxml.etree._makeSubElement
  File "src/lxml/apihelpers.pxi", line 195, in lxml.etree._makeSubElement
  File "src/lxml/etree.pyx", line 1630, in lxml.etree._elementFactory
  File "src/lxml/classlookup.pxi", line 403, in lxml.etree._parser_class_lookup
  File "src/lxml/classlookup.pxi", line 456, in lxml.etree._custom_class_lookup
  File "/usr/lib/python3.10/site-packages/lxml/html/__init__.py", line 734, in lookup
    if node_type == 'element':
RecursionError: maximum recursion depth exceeded in comparison
>>>

@surister
Copy link
Contributor

@521xueweihan

I'd love to see a test for this and perhaps the proposed fix could be slightly refactored since we could do

try:
    ...
except (Exception1, Exception2):
    pass

I reckon it's being a couple of years, I might understand that you are no longer interested nor active in this repo, In a few days I will do it myself, I will reference this PR to try give you some credit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants