Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimenting: Annotated[Any, pybind11.CppType("cpp_namespace::UserType")] #4888

Draft
wants to merge 49 commits into
base: master
Choose a base branch
from

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Oct 19, 2023

Description

For context see #4876 (comment) and surrounding comments.

Note: PR #5073 was split out from this PR and released with pybind11 v2.12.0

TODO:

Suggested changelog entry:

…t`, etc.

This resolves all test failures except:

```
E         - create_rec_nested(arg0: int) -> numpy.ndarray[Annotated[Any, "NestedStruct"]]
...
FAILED test_numpy_dtypes.py::test_signature - assert --- actual / +++ expected
```
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 19, 2023

@sizmailov The idea here is to not knowingly generate docstrings that stubgen cannot possibly process correctly.

stubgen 1.5.1 does not process the Annotated[Any, "cpp_namespace::UserType"] as desired.

What do you think, is this a useful direction? Could stubgen be updated to process Annotated as desired?

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 20, 2023

Some new information:

I ran global testing with this PR. Through that I discovered 3 .pyi files that look better, for example (manipulated because it's closed source):

-    def foo(self, *args, **kwargs) -> Any: ...
+    def foo(self, value) -> None: ...

That makes me think stubgen already has some functionality for handling Annotated[Any, "cpp_namespace::UserType"], but it's not applied recursively.

@sizmailov
Copy link
Contributor

What do you think, is this a useful direction? Could stubgen be updated to process Annotated as desired?

What would be the desired output? The following should be easily achievable:

def values(self) -> Iterator[Annotated[Any, "cpp_namespace::UserType"]]: ...

As a user of a python extension I would prefer to have this:

def values(self) -> Iterator[py_module.UserType]: ...

It's possible but requires information on mapping C++ types to Python, which is unavailable for stubgen. The only place that has this mapping is the binding code (e.g. pybind11_protobuf). Not sure how this can be extracted and passed to stubgen...

If I would go with Annotated I'd put some "label" around type string to leave no room for mistake (for 3rd-party tools that make use of annotations), e.g.:

def values(self) -> Iterator[Annotated[Any, Pybind11FromCppType("cpp_namespace::UserType")]]: ...

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 21, 2023

def values(self) -> Iterator[Annotated[Any, Pybind11FromCppType("cpp_namespace::UserType")]]: ...

I'll try that, thanks!

What I'm aiming for here is a last-ditch effort to produce a docstring that does not trip up stubgen. The current behavior is to simply let it fall on the floor.

As a user of a python extension I would prefer to have this:
def values(self) -> Iterator[py_module.UserType]: ...

That only works for py::class_- or py::enum_-wrapped types. The implementation is here (right above the code this PR is changing):

  • if (auto *tinfo = detail::get_type_info(*t)) {
    handle th((PyObject *) tinfo->type);
    signature += th.attr("__module__").cast<std::string>() + "."
    + th.attr("__qualname__").cast<std::string>();
    } else if (rec->is_new_style_constructor && arg_index == 0) {
    // A new-style `__init__` takes `self` as `value_and_holder`.
    // Rewrite it to the proper class type.
    signature += rec->scope.attr("__module__").cast<std::string>() + "."
    + rec->scope.attr("__qualname__").cast<std::string>();
    } else {

There is no existing mechanism for replacing the % placeholder (originating from here) at runtime with a fully-qualified Python name. Ideas that come to mind: maybe another registry; maybe a magic classmethod. But even if implemented, that can fail, too, so a robust fallback like what this PR is about will always have a value.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 21, 2023

I decided to go with

def values(self) -> Iterator[Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]: ...

For the 5 .pyi files that I hit on via global testing before, inserting CppTypePybind11() does not make a difference. I like the extra clarity though, therefore I initiated another global testing run with this updated PR.

Comment on lines 496 to 500
if (detail::cpp_name_needs_typing_annotated(tname.c_str())) {
signature += "Annotated[Any, CppTypePybind11(\"" + tname + "\")]";
} else {
signature += tname;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, we should always wrap C++ types with Annotated at this point.
Being a top-level non-template type is not enough to assume 1-to-1 C++-Python type name correspondence, although it is the case for (some) built-in types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the following to cast.h allows us to remove the above condition without turning all tests red.

template <>
struct handle_type_name<str> {
    static constexpr auto name = const_name("str");
};
template <>
struct handle_type_name<set> {
    static constexpr auto name = const_name("set");
};
template <>
struct handle_type_name<anyset> {
    static constexpr auto name = const_name("set");
};
template <>
struct handle_type_name<tuple> {
    static constexpr auto name = const_name("tuple");
};
template <>
struct handle_type_name<frozenset> {
    static constexpr auto name = const_name("frozenset");
};
template <>
struct handle_type_name<list> {
    static constexpr auto name = const_name("list");
};
template <>
struct handle_type_name<dict> {
    static constexpr auto name = const_name("dict");
};
template <>
struct handle_type_name<object> {
    static constexpr auto name = const_name("object");
};

The only two failures probably indicate the error in tests themselves:

Pytest errors

====================================================================================================================== FAILURES ======================================================================================================================
___________________________________________________________________________________________________________________ test_signature ___________________________________________________________________________________________________________________

doc = <conftest.SanitizedString object at 0x7fafa7b3f8e0>

    def test_signature(doc):
>       assert (
            doc(m.create_rec_nested)
            == "create_rec_nested(arg0: int) -> numpy.ndarray[NestedStruct]"
        )
E       assert --- actual / +++ expected
E         - create_rec_nested(arg0: int) -> numpy.ndarray[Annotated[Any, CppTypePybind11("NestedStruct")]]
E         ?                                               --------------------------------            -- -
E         + create_rec_nested(arg0: int) -> numpy.ndarray[NestedStruct]

doc        = <conftest.SanitizedString object at 0x7fafa7b3f8e0>

../../tests/test_numpy_dtypes.py:344: AssertionError
______________________________________________________________________________________________________________________ test_set ______________________________________________________________________________________________________________________

capture = <conftest.Capture object at 0x7fafa82995a0>, doc = <conftest.SanitizedString object at 0x7fafa82991b0>

    def test_set(capture, doc):
        s = m.get_set()
        assert isinstance(s, set)
        assert s == {"key1", "key2", "key3"}
    
        s.add("key4")
        with capture:
            m.print_anyset(s)
        assert (
            capture.unordered
            == """
            key: key1
            key: key2
            key: key3
            key: key4
        """
        )
    
        m.set_add(s, "key5")
        assert m.anyset_size(s) == 5
    
        m.set_clear(s)
        assert m.anyset_empty(s)
    
        assert not m.anyset_contains(set(), 42)
        assert m.anyset_contains({42}, 42)
        assert m.anyset_contains({"foo"}, "foo")
    
        assert doc(m.get_set) == "get_set() -> set"
>       assert doc(m.print_anyset) == "print_anyset(arg0: anyset) -> None"
E       assert --- actual / +++ expected
E         - print_anyset(arg0: set) -> None
E         + print_anyset(arg0: anyset) -> None
E         ?                    +++

capture    = <conftest.Capture object at 0x7fafa82995a0>
doc        = <conftest.SanitizedString object at 0x7fafa82991b0>
s          = set()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't point out before, I got the same result already, it's in a commit here: 76b4a34

The only remaining errors are related to numpy NestedStruct. I tried to pin-point where the % originates, but I gave up after looking for ~5 minutes, reverted the commit, and then tried the cpp_name_needs_typing_annotated() heuristic instead to see how far we get with that.

I agree what you suggest here is best, but someone needs to figure out how to deal with the failing test, and then we have to decide if we need to roll out the behavior change in stages, maybe opt-in first in one release, then opt-out in the next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. You already tried that. I should have noticed.

I think we should apply the following diff to tests.

  • NestedStruct doesn't have a descriptive python counterpart, the CppTypePybind11("NestedType") is the best we can do.
  • There is no such thing as anyset in python.
template <>
struct handle_type_name<anyset> {
    static constexpr auto name = const_name("set");
};

This should resolve the troubles with broken tests.

diff

diff --git a/tests/test_numpy_dtypes.py b/tests/test_numpy_dtypes.py
index d10457ee..2fc3771c 100644
--- a/tests/test_numpy_dtypes.py
+++ b/tests/test_numpy_dtypes.py
@@ -343,7 +343,7 @@ def test_complex_array():
 def test_signature(doc):
     assert (
         doc(m.create_rec_nested)
-        == "create_rec_nested(arg0: int) -> numpy.ndarray[NestedStruct]"
+        == "create_rec_nested(arg0: int) -> numpy.ndarray[Annotated[Any, CppTypePybind11(\"NestedStruct\")]]"
     )
 
 
diff --git a/tests/test_pytypes.py b/tests/test_pytypes.py
index 2b202731..2824d29f 100644
--- a/tests/test_pytypes.py
+++ b/tests/test_pytypes.py
@@ -121,7 +121,7 @@ def test_set(capture, doc):
     assert m.anyset_contains({"foo"}, "foo")
 
     assert doc(m.get_set) == "get_set() -> set"
-    assert doc(m.print_anyset) == "print_anyset(arg0: anyset) -> None"
+    assert doc(m.print_anyset) == "print_anyset(arg0: set) -> None"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NestedStruct doesn't have a descriptive python counterpart

Nice, thanks! I didn't know / wasn't sure before. I'm glad I gave up before, finding out the hard way. :-)

There is no such thing as anyset in python.

Of course! :-) Thanks for catching that as well, I was just going through with rushed lawnmower mentality.

Please take another look, the cpp_name_needs_typing_annotated() heuristic is gone, all tests pass (locally at least).

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 21, 2023

Another data point, result from globally testing this PR @ commit 272152e, which inserts CppTypePybind11() like this:

Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]

There is only 1 new test failure, in tensorflow, a comparison with a golden file. That should be easy to deal with.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 21, 2023

@sizmailov do have connections to the mypy stubgen owners?

I feel it would be ideal to get their input before we make a move here. Concrete questions:

  • What's their overall high-level reaction to using Annotated[Any, CppTypePybind11("cpp_namespace::UserType")] (or similar)?

It seems like mypy does handle outer-level Annotated, but not nested ones, e.g.

Iterator[str, Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]
  • What are the chances that mypy will be changed to support such nested Annotated?

@rwgk rwgk changed the title Experimenting: Annotated[Any, "cpp_namespace::UserType"] Experimenting: Annotated[Any, CppTypePybind11("cpp_namespace::UserType")] Oct 21, 2023
};
template <>
struct handle_type_name<anyset> {
static constexpr auto name = const_name("set");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://docs.python.org/3/c-api/set.html anyset is an alias for "set or fronzenset".

The contexts where rendering it as set | frozenset might lead to trouble are pretty exotic, I think it's ok to render as union type:

Suggested change
static constexpr auto name = const_name("set");
static constexpr auto name = const_name("set | frozenset");

Copy link
Collaborator Author

@rwgk rwgk Oct 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: ace70b0

I see Union[...] is used in multiple places.

Could it be better to use Union[set, frozenset] for self-consistency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's use Union for consistency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 7c8991a

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 21, 2023

@sizmailov do have connections to the mypy stubgen owners?

I went ahead and created python/mypy#16306

@sizmailov
Copy link
Contributor

type, exception, memoryview and bytearray also miss the handle_type_name.

Not sure about dtype, module_, generic_type and exception, since they are not in pytypes.h.

The rest seems to be properly handled.

grep -Poh 'class \w+ : public object ' include/pybind11/*.h
class dtype : public object 
class module_ : public object 
class generic_type : public object 
class exception : public object 
class iterator : public object 
class type : public object 
class iterable : public object 
class str : public object 
class bytes : public object 
class bytearray : public object 
class none : public object 
class ellipsis : public object 
class bool_ : public object 
class int_ : public object 
class float_ : public object 
class weakref : public object 
class slice : public object 
class capsule : public object 
class tuple : public object 
class dict : public object 
class sequence : public object 
class list : public object 
class anyset : public object 
class function : public object 
class staticmethod : public object 
class buffer : public object 
class memoryview : public object 

@rwgk rwgk changed the title Experimenting: Annotated[Any, CppTypePybind11("cpp_namespace::UserType")] Experimenting: Annotated[Any, pybind11.CppType("cpp_namespace::UserType")] Nov 16, 2023
…lizations, adjust tests.

The primary change is:

```diff
 template <typename T>
-struct handle_type_name {
-    static constexpr auto name = const_name<T>();
-};
+struct handle_type_name;
+
```

All other changes are adjustments to restore successful build & test.
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 17, 2023

@sizmailov I'm interested in your opinion regarding d14d91e, could you please take a look?

The main idea is to remove the handle_type_name default implementation, to enforce that all types are explicitly specialized, to guard against oversights.

That generates the question: What do we actually want in the added specializations? — I'm not sure, any advice is appreciated.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 29, 2023

Logging the results of a simple experiment: Google global testing with the current version of this PR:

(Internal test ID: OCL:593010609:BASE:593028799:1703228284011:fdcd5c9a)

22 build failures (22 unique)

$ grep 'cast.h:1208:32' * | cut -d"'" -f2 | sort | uniq -c
      6 pybind11::detail::handle_type_name<jax::(anonymous namespace)::PjitFunction::pyobject>
      6 pybind11::detail::handle_type_name<jax::PmapFunction::pyobject>
      6 pybind11::detail::handle_type_name<pybind11::enum_<xla::OpSharding_ShardGroupType>>
      6 pybind11::detail::handle_type_name<pybind11::enum_<xla::OpSharding_Type>>
     45 pybind11::detail::handle_type_name<xla::PyArray>

It would probably take a few hours of effort to get those fixed (by adding handle_type_name specializations).

It might be too disruptive to release pybind11 without the handle_type_name default implementation, at least without a transition plan (e.g. first a release with opt-in, later a release changing to opt-out).

@rwgk rwgk mentioned this pull request Mar 26, 2024
rwgk added a commit to rwgk/pybind11 that referenced this pull request Mar 27, 2024
rwgk added a commit that referenced this pull request Mar 27, 2024
* Transfer bug fixes from #4888 wholesale. Full test coverage for all fixes is still missing.

* Add cmake option(PYBIND11_DISABLE_HANDLE_TYPE_NAME_DEFAULT_IMPLEMENTATION) and use in some tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants