Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pypy2: EntryPoint object repr results in infinite recursion #97

Closed
jaraco opened this issue Oct 22, 2020 · 32 comments
Closed

pypy2: EntryPoint object repr results in infinite recursion #97

jaraco opened this issue Oct 22, 2020 · 32 comments

Comments

@jaraco
Copy link
Member

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 29, 2019, 18:08

>>> from importlib_metadata import EntryPoint
>>> EntryPoint('foo', 'bar', None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

...

  File "<string>", line 22, in __repr__
  File "<string>", line 22, in __repr__
  File "<string>", line 22, in __repr__
  File "<string>", line 22, in __repr__
  File "<string>", line 22, in __repr__
  File "/home/asottile/workspace/flake8/.tox/pypy/site-packages/importlib_metadata/__init__.py", line 125, in __iter__
    return iter((self.name, self))
RuntimeError: maximum recursion depth exceeded
@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 29, 2019, 18:17

mentioned in commit asottile/flake8@6b764d0

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 29, 2019, 18:18

mentioned in merge request pycqa/flake8!389

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 30, 2019, 02:33

I'd be interested in working on this, the root cause is the same as #96 -- generally when extending NamedTuple overriding __iter__ is a bad idea (since things expecting tuples will expect it to work in a certain way). In this case, one of the other tuple member functions makes assumptions about __iter__ (which are violated by the specification in importlib.metadatas EntryPoint

I'm wondering if at this point it's too late to back out __iter__ from EntryPoint (it's provisional in python3.8 right?)

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:24

mentioned in commit 10234d5

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:25

In the aforementioned commit, I added a test. The test only fails on PyPy 2, but passes on PyPy 3.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:26

changed title from pypy: EntryPoint object repr results in infinite recursion to pypy{+2+}: EntryPoint object repr results in infinite recursion

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:31

In the bugfix/pickleable-ep branch, I have a fix for #96 (I'm just waiting on a fix for #100 before submitting). Given the limited impact (doesn't affect late PyPy 3) of this issue, I'm less inclined to re-design EntryPoint.__iter__ behavior, though I believe that's a possibility. It would be a 2.x release of the backport and might have to wait for Python 3.9 to change for stdlib Python.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:38

These are the tested assertions that fail when removing __iter__:

importlib_metadata master $ git diff                                                                                                                                      
diff --git a/importlib_metadata/__init__.py b/importlib_metadata/__init__.py
index f0e7eba..ae78b9f 100644
--- a/importlib_metadata/__init__.py
+++ b/importlib_metadata/__init__.py
@@ -118,12 +118,6 @@ class EntryPoint(collections.namedtuple('EntryPointBase', 'name value group')):
             config.readfp(io.StringIO(text))
         return EntryPoint._from_config(config)
 
-    def __iter__(self):
-        """
-        Supply iter so one may construct dicts of EntryPoints easily.
-        """
-        return iter((self.name, self))
-
 
 class PackagePath(pathlib.PurePosixPath):
     """A reference to a path in a package"""
importlib_metadata master $ tox -e py37                                                                                                                                   
py37 develop-inst-noop: /Users/jaraco/code/public/importlib_metadata
py37 installed: -e git+gl://python-devs/importlib_metadata@c471831d2bdda601b32b38295fc893e908cc972a#egg=importlib_metadata,more-itertools==8.0.0,packaging==19.2,pyparsing==2.4.5,six==1.13.0,zipp==0.6.0
py37 run-test-pre: PYTHONHASHSEED='1602908799'
py37 run-test: commands[0] | python -m unittest discover
E..........................E.E........E.....E..
======================================================================
ERROR: test_entry_points (importlib_metadata.tests.test_api.APITests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jaraco/code/public/importlib_metadata/importlib_metadata/tests/test_api.py", line 57, in test_entry_points
    entries = dict(entry_points()['entries'])
ValueError: dictionary update sequence element #0 has length 3; 2 is required

======================================================================
ERROR: test_entrypoint_with_colon_in_name (importlib_metadata.tests.test_main.ImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jaraco/code/public/importlib_metadata/importlib_metadata/tests/test_main.py", line 53, in test_entrypoint_with_colon_in_name
    entries = dict(entry_points()['entries'])
ValueError: dictionary update sequence element #0 has length 3; 2 is required

======================================================================
ERROR: test_resolve (importlib_metadata.tests.test_main.ImportTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jaraco/code/public/importlib_metadata/importlib_metadata/tests/test_main.py", line 48, in test_resolve
    entries = dict(entry_points()['entries'])
ValueError: dictionary update sequence element #0 has length 3; 2 is required

======================================================================
ERROR: test_zip_entry_points (importlib_metadata.tests.test_zip.TestEgg)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jaraco/code/public/importlib_metadata/importlib_metadata/tests/test_zip.py", line 38, in test_zip_entry_points
    scripts = dict(entry_points()['console_scripts'])
ValueError: dictionary update sequence element #0 has length 3; 2 is required

======================================================================
ERROR: test_zip_entry_points (importlib_metadata.tests.test_zip.TestZip)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jaraco/code/public/importlib_metadata/importlib_metadata/tests/test_zip.py", line 38, in test_zip_entry_points
    scripts = dict(entry_points()['console_scripts'])
ValueError: dictionary update sequence element #0 has length 3; 2 is required

----------------------------------------------------------------------
Ran 47 tests in 0.203s

FAILED (errors=5)
ERROR: InvocationError for command /Users/jaraco/code/public/importlib_metadata/.tox/py37/bin/python -m unittest discover (exited with code 1)
________________________________________________________________________________ summary _________________________________________________________________________________
ERROR:   py37: commands failed

It feels to me like the value of constructing dicts of EntryPoints is more valuable than the edge cases that it trips up, especially if those edge cases can be addressed directly and cleanly.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:53

mentioned in commit 7310c08

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:54

In the bugfix/97-entry-point-repr, I've pushed a patch that surgically addresses the issue. I'll plan to submit that for review after #100 is fixed, but I'm open to your thoughts anytime.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 30, 2019, 15:29

Here's an example which is impossible to fix:

>>> entrypoint = EntryPoint('a', 'b', 'c')
>>> json.dumps(ep)
...
RecursionError

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 30, 2019, 15:31

(typed that on phone, here's the actual error):

>>> json.dumps(importlib_metadata.EntryPoint('a', 'b', 'c'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
ValueError: Circular reference detected

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 17:02

mentioned in commit 741ed9eeca75f288ee5a0bbbb341963aff4c6d9a

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 17:02

mentioned in commit 3b999e408591f26ee14fe9bacccd511a2784c46c

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 17:02

mentioned in commit b742833623ad7b9d42e65086d5731a6a39625ba5

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 17:30

mentioned in commit 645b25330dbb847da8a02de3b47ed6414daa605d

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 17:30

mentioned in commit a8b9a314120811b6ee48b53d9889d5900df91679

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 17:30

mentioned in commit 727922c65cf86578d10274c239d20dfaf477d23b

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 17:31

In the feature/entrypoints-not-tuple branch, I've explored this latest manifestation, assembling the various requirements from this an #96 and consolidated those tests.

EntryPoints were never intended to be serializable to JSON and the fact that collections.namedtuple objects are serialized as indistinguishable from an unnamed tuple is probably a bug (the round-trip doesn't restore the object), but then again, the JSON support has never been very friendly to anything but the most basic use cases.

In any case, I would expect if one wants to JSON-serialize an EntryPoint, they should have to write a custom encoder. The fact that a "Circular reference is detected" is as fine an error as any really.

The real question is - which is better - having a namedtuple that isn't iterable in the usual way, or having a custom object that requires separately implementing immutability and equality and repr (as this branch does)?

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 30, 2019, 17:47

json serialization is just one example, there's probably more if I were to look into it :S. as far as I can tell the __iter__ as currently implemented isn't documented -- I would think it'd be easier to just remove that 🤷

reimplemeting the immutability is probably the best approach (will probably also want __hash__ and __eq__ and perhaps __lt__ as well though)

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 18:24

It's not documented, but it is an intentional feature that's captured by 5 tests. I wanted it to be trivially easy for a consumer to find entry points by name (and to construct dictionaries of entry points by name). I'm seeking to avoid users having to import another helper function for this purpose (as they must with entrypoints) and the associated namespacing challenges that brings (do you put entry-point utility functions alongside other distribution functions or encapsulate them somewhere).

The goal was for users to have one function to retrieve entry points and be able to readily use the result for a variety of use-cases. I'm thinking we should document and advertise and encourage the dict([List of EntryPoint]) usage as a supported usage. And it is documented in the sense that the docstring explicitly calls out why it's there, so anybody invoking help(EntryPoint) would have that indication that it's supported. It might also be nice to have "porting from entrypoints" and "porting from pkg_resources" documentation, which we don't have yet, but which probably would illustrate the recommended usage.

I don't think it's enough to illustrate (increasingly obscure) uses that break this design. One also needs to present an alternative - how would you recommend satisfying both the list-of-entrypoints and dict-of-entrypoints-by-name use-cases?

Personally, I'm really proud of the existing implementation. It allows a dict-of-entrypoints-by-name to be constructed from a list-of-entrypoints by just applying a built-in dict, and it does that with two lines of code plus a docstring. It is a little magical and thus vulnerable to issues like you've raised, but I can't help but feel like there can't be that many lurking issues. In my experience, if your objects are immutable and support pickleability, there usually aren't many issues that remain.

Of the options we have, my preference would be to keep the namedtuple (so we get hashability, equality, repr, immutability for free) and apply the workarounds for use-cases that specifically need supported (pickleability for flake8 seems quite necessary, repr on pypy2 maybe less so, json support and other issues unlikely to be needed). My second preference would be to re-implement most of namedtuple to avoid those pitfalls, but retain the __iter__. I'd be willing to entertain a third option, which would be to replace the dict(EntryPoint) support with something else, but I'd like to see what that looks like. Would you be willing to put together a draft of how the tests change if support for dict construction is removed?

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Nov 30, 2019, 18:28

yep, I'll take a stab at this -- thanks for the thorough writeup 👍 -- the iter thing is kind of nice though it's not much more work than documenting {ep.name: ep for ep in entry_points()['foo']}

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 18:37

Another thing I just realized - the order of the results from entry_points()[group] might be relevant, so if a client was concerned about the order, they could use OrderedDict(eps) to retain order. I'm not sure that's an important use-case, but it does illustrate the flexibility of the iter approach.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 15:31

mentioned in commit cdae203

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 15:42

mentioned in commit 104b27c

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 15:42

mentioned in commit d7653d7

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 15:46

I've decided to go ahead and merge the surgical fixes for this and release it as 1.1.0, but that doesn't preclude taking another approach. Depending on what you find in your draft, feel free to re-open this issue or open another issue about the more general problem.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 15:53

mentioned in commit 031c7f0

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 15:53

closed via commit 7310c08

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 15:53

closed via merge request !100

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Dec 1, 2019, 16:04

mentioned in commit c3f0769d6320e6f03c5bbcddefa6f36e7f087c2a

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @asottile on Dec 4, 2019, 17:03

yep, sounds good -- I think this approach is fine for now -- I'll do some more testing with other apis and see if there's something more useful than that.

I still think the __iter__ is more surprising than it is helpful (when the dict comprehension or OrderedDict((ep.name, ep) for ep in entry_points()['foo']) is a little easier to understand)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant