Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an Array Protocol & improve static typing support #589

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

nstarman
Copy link
Contributor

@nstarman nstarman commented Feb 5, 2023

Continuing from #584, I've done a quick refactor of the array (and dtype) objects to make them typing Protocols.
@rgommers, I agree that making Array generic wrt the dtype is good for a followup.

It's not perfect, but mypy doesn't complain a whole lot, so I think this is a good start. I'm happy to make refinements.

Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this :)

The Array protocl stuff looks grand. And personally I like having a DType class just for consistency, even if practically type hinting an object has __eq__ isn't useful.

On namespace type hints (e.g. _CreatoinFuncs, ArrayAPINamespace): IMO I'm not sure on this yet, as they make contributing to the spec less ergonomic given we repeat the signatures, although they are pretty nifty to have. Is there a world where these could be dynamically construct them from the stubs instead? We might want to put these in a follow-up PR if just to ship the concretely non-controversial stuff first—interested in what others think.

src/array_api_stubs/_draft/_types.py Outdated Show resolved Hide resolved

@property
def ndim(self: array) -> int:
def ndim(self) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I see methods/properties which don't return arrays don't get any type hint for their self arg—could these be hinted with Array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static type checkers don't like this because the TypeVar is not properly bound — see https://peps.python.org/pep-0484/#scoping-rules-for-type-variables ("Unbound type variables should not appear in the bodies of generic functions, or in the class bodies apart from method definitions:").
In general, there's no need to type hint the self arg unless the return type depends on the type of self — e.g. see https://peps.python.org/pep-0484/#user-defined-generic-types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice thing about using Protocol is that self is understood by static type checkers to be the Protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this only done for certain methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it for any methods I looked at over the course of the PR. This is by no means comprehensive and should be done: here or elsewhere.

@nstarman
Copy link
Contributor Author

nstarman commented Feb 6, 2023

We might want to put these in a follow-up PR if just to ship the concretely non-controversial stuff first—interested in what others think.

I was thinking the same thing last evening about moving this to a followup PR.

On namespace type hints (e.g. _CreatoinFuncs, ArrayAPINamespace): IMO I'm not sure on this yet, as they make contributing to the spec less ergonomic given we repeat the signatures, although they are pretty nifty to have. Is there a world where these could be dynamically construct them from the stubs instead?

Maybe? I haven't experimented yet but perhaps mypy will be satisfied with:

class ArrayAPINamespace(Protocol):

    zeros_like = staticmethod(creation_funcs.zeros_like)

Also for a future PR: zeros_like could be promoted to a Protocol,

class Array(Protocol):
    def __array_namespace__(self, ...) -> ArrayAPINamespace[Self]: ...


class ArrayAPINamespace(Protocol[Array]):

    zeros_like = staticmethod(zeros_like)


class zeros_like(Protocol[Array]):
    def __call__(self, x: Array, /, *, dtype: Optional[dtype] = None, device: Optional[device] = None) -> Array: ...

The advantage of this is that functions can be checked to see if their signature is compatible with zeros_like.

@rgommers rgommers added RFC Request for comments. Feature requests and proposed changes. topic: Static Typing Static typing. labels Feb 20, 2023
@rgommers rgommers changed the title Array Protocol Add an Array Protocol & improve static typing support Feb 20, 2023
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nstarman, and apologies for the delay in reviewing (your PR came in right at the start of my 2 week holiday). This is looking quite good. I pushed a couple of small commits to resolve some Mypy and Sphinx complaints.

This is close to ready I'd say. One thing that isn't nice is that Self now shows up instead of array in the signatures of the html docs. I tried to fiddle with autodoc_type_aliases in src/_array_api_conf.py, but was not successful in mapping that back to array. Using "self" will be pretty confusing in docs for regular methods that return arrays. I think we'll have to figure that one out.

The DType protocol is fine with me - it won't add much to type-checking, but it should be fine to have it here for consistency.

The Array protocol looks good.

@rgommers
Copy link
Member

@BvB93 if you have any feedback on this PR, that would be great to see.

@rgommers
Copy link
Member

@nstarman there are 3 errors left when running mypy array_api_stubs/_draft/array_object.py (from lots more on main). There are more errors in other files - but if we are going to maintain static typing support in this repo, adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?

@nstarman
Copy link
Contributor Author

One thing that isn't nice is that Self now shows up instead of array in the signatures of the html docs.

If the array type is just text in the html docs and not 'clickable', then we can rename Self to array.

adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?

That sounds like a good idea. I can try adding to the CI and create an ignore file that can later be whittled down as the package is made more type friendly.

@rgommers
Copy link
Member

If the array type is just text in the html docs and not 'clickable', then we can rename Self to array.

That works. It's not clickable - there is no array object in the standard on purpose, because in the end we don't care whether it's named ndarray, Tensor, or something else.

@nstarman
Copy link
Contributor Author

nstarman commented Aug 12, 2023

@nstarman nstarman marked this pull request as ready for review August 12, 2023 21:21
@asmeurer
Copy link
Member

Sphinx is trying to create links for the type hints. I think to fix that you need to update the list(s) here https://github.com/data-apis/array-api/blob/main/src/_array_api_conf.py#L52-L77

@nstarman
Copy link
Contributor Author

nstarman commented Aug 14, 2023

I can squash commits if you don't use Squash & Merge (assuming this is approved, of course :) ).
Thanks @asmeurer for the pointer.

Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!


@property
def shape(self: array) -> Tuple[Optional[int], ...]:
def shape(self) -> tuple[int | None, ...]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to say more for myself than anyone else—I initially wasn't too sure if we wanted to use the py3.9+ type hint features, but:

  • Forgot that from future import __annotations__ makes this work for py3.8
  • py3.8 doesn't get updates end of next year
  • The scientific ecosystem already is adopting py3.9+

src/array_api_stubs/_draft/array_object.py Outdated Show resolved Hide resolved
@nstarman nstarman requested a review from honno August 18, 2023 03:00
Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my end!

I think pre-commit is a good addition (would love to add hooks for other spec things myself), but probably not too useful until we get a workflow that runs it. That said, I'm not too sure if we'd want to do that just yet 🤔

@nstarman
Copy link
Contributor Author

nstarman commented Aug 18, 2023

I think pre-commit is a good addition (would love to add hooks for other spec things myself), but probably not too useful until we get a workflow that runs it. That said, I'm not too sure if we'd want to do that just yet 🤔

pre-commit.ci can be turned on, sans additional GH workflows.
black, and ruff would be wonderful as well!

@nstarman
Copy link
Contributor Author

Looks good on my end!

Thanks @honno for the reviews and discussion.

@asmeurer
Copy link
Member

Any reason to change array to Array? I think we might prefer to keep it as array like it is now.

@nstarman
Copy link
Contributor Author

nstarman commented Aug 30, 2023

The class would have to be called array, which is problematic re PEP-8 and also conflicts name-wise with the array TypeVar. The latter can be changed, but for the former isn't it desirable to refer to array instances of some class Array?

@asmeurer
Copy link
Member

The class name is specifically not part of the standard. I always felt that array made that a little clearer than Array, but maybe that's just me.

By the way, the protocol isn't going to require the class to be called Array is it? If it does, that's a problem.

@nstarman
Copy link
Contributor Author

nstarman commented Aug 31, 2023

By the way, the protocol isn't going to require the class to be called Array is it? If it does, that's a problem.

Not specifically. The protocol is a class and may be named anything. If it's named the same things as another variable, e.g. the array = TypeVar("array", bound=Array) then that can be problematic.

As a more zoomed out view. The goal of this and subsequent PRs is to enable the spec to also be an installable package that can be used for run-time checks of adherence to this spec as well as for general type hinting.

>>> from data_apis.array import Array  # this spec
>>> from numpy import array, ndarray  # post v2.0

>>> issubclass(ndarray, Array)
True

>>> isinstance(array(1), Array)
True

To be clear, Array cannot be instantiated since it is a protocol without any implementation. Its use is entirely for structural subtyping (e.g. type hinting) and run-time checks.
At some level it's important that the spec adhere to Python standards, e.g. naming conventions.

The class name is specifically not part of the standard.

I'm not sure I understand then what's the problem with calling it Array?

src/array_api_stubs/_draft/array_object.py Outdated Show resolved Hide resolved

@property
def device(self: array) -> Device:
def device(self) -> "Device": # type: ignore[type-var]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def device(self) -> "Device": # type: ignore[type-var]
def device(self) -> Any:

Type variables (such as Device here) don't really make sense when it's only used as return type without using them in either a generic class or another parameter within the function; this is the source of mypy's error here and why you need the type: ignore; there must always be at least two of them.

Instead, unless you're willing to make the protocol itself generic w.r.t. the device (which might be a bit excessive here), then it's customary to use Any. I'm not sure if this Device type is used anywhere else, but if not you could even consider just repurposing it as an Any alias in order to keep its more descriptive name.

Copy link
Contributor Author

@nstarman nstarman Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you're willing to make the protocol itself generic w.r.t. the device (which might be a bit excessive here),

This is planned, but for a followup PR. (See opening comment to this PR).

then it's customary to use Any

Alternatively, this PR introduces a Device Protocol which can be used instead of the TypeVar. No binding issues because it's a Protocol. When Array is made generic wrt Device we can change this back to a TypeVar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this PR introduces a Device Protocol which can be used instead of the TypeVar. No binding issues because it's a Protocol. When Array is made generic wrt Device we can change this back to a TypeVar.

I guess that would be sort of possible? Rereading the Device Support docs concrete API implementations have a lot of freedom regarding the implementation of their Device type, to the point where only __eq__ (and __ne__) really matter? Honestly, with an interface that minimal it might just be worthwhile to stick to plain object rather than Any or a protocol, as there's very little (or nothing) to gain by usage of the latter two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently a Device Protocol has only __eq__ in it and as such is effectively equivalent to object.
But it's designing with the future in mind ("The choice to not include a standardized Device object may be revisited in a future revision of this standard.") and we'd need to alias object anyway to get the correct rendering on the docs...

@@ -39,6 +43,7 @@ def device(self: array) -> Device:
out: device
a ``device`` object (see :ref:`device-support`).
"""
...

@property
def mT(self: array) -> array:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a heads up: the PEP 673 Self type (a special type variable that's always bound to the respective classes self parameter) is now also supported by mypy and the likes, so you could, if so desired, express this as:

if TYPE_CHECKING:
    from typing_extensions import Self
    
 class Array(Protocol):
    def mT(self) -> Self:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a note already

array = TypeVar("array", bound="Array")
# NOTE: when working with py3.11+ this can be ``typing.Self``.

But I'm happy to change it now. I was avoiding adding additional dependencies, even if they aren't runtime and
are made by the core python devs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is perfectly fine in my opinion; at the end of the day it's just a nice bit of syntactic sugar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's tackle this in a followup, since Self will need to be rendered in the docs as Array.

@@ -272,7 +272,7 @@ def matrix_norm(
/,
*,
keepdims: bool = False,
ord: Optional[Union[int, float, Literal[inf, -inf, "fro", "nuc"]]] = "fro",
ord: Optional[Union[int, float, Literal[inf, -inf, "fro", "nuc"]]] = "fro", # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there's currently no way to express literal floats in the type system, hence the need for the # type: ignore comment. Could maybe consider adding two Inf and NInf types (or something along those lines) and declare them as float aliases?

Inf = float
NInf = float

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's annoying Python can't do Literal[<float>]. I considered doing this, but I think I prefer not to change the spec in this PR. Changing to alias values will require additional docs and explanation since Inf = float is not the same as math.inf, and the latter is part of the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to alias values will require additional docs and explanation since Inf = float is not the same as math.inf, and the latter is part of the spec.

True, though the float already being present as a valid ord parameter means that the inclusion of math.inf is redundant either way (from a typing perspective at least), so to me this doesn't sound like such a change would violate the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the type ignores only to make my local mypy happier (mypy isn't part of the the CI on this repo).
I think making these changes should be done in a followup PR. I'm happy to make the changes in another PR.

src/array_api_stubs/_draft/array_object.py Outdated Show resolved Hide resolved

def to_device(
self: array, device: Device, /, *, stream: Optional[Union[int, Any]] = None
self: array, device: "Device", /, *, stream: int | Any | None = None # type: ignore[type-var]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's been some minor discussion about this before (xref numpy/numpy#19083 (comment)), but considering this class is now becoming a fully fledged protocol and not just a documentation device the int type in the stream annotation is starting to become a bit problematic as it makes it mandatory for any (concrete) subtype to support it (while this is, in fact, optional).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good point. I'm trying to translate this class, not change the spec, so I think this is best done in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this would not be a spec violation as it'd merely remove an inconsistency between the type signature as implied by the annotations (..., stream: int | Any | None = ...) versus the one described in docstring (..., stream: None = ...).

As a little bit more background on the nitty gritty part of the world of typing that's going on here: Contrary to the return type of callables (which is covariant and concrete protocol implementations can thus return any sub-type), argument types are contravariant, meaning that concrete implementations will either have to exactly match it or return a supertype thereof (in the most extreme scenario that'd mean that even plain object is acceptable, though whether or not that's an actual good practice is another discussion entirely...).

What this means is that in order to proper subtype something like def (stream: int | None = ...) -> object: ... then concrete implementation must support both int and None, something that contracts the actual docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@BvB93
Copy link
Contributor

BvB93 commented Sep 12, 2023

LGTM, though it might be good to verify if the currently available downstream implementations properly satisfy subtype the protocol as this could potentially reveal some typing-related corner cases that we've missed thus far. Could be something for a follow up though.

nstarman and others added 7 commits September 21, 2023 10:51
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Sphinx doesn't set `TYPE_CHECKING`, but does use the type annotations.
`Self` is unknown to Sphinx, so should be filtered out to prevent lots
of errors.
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
@TomNicholas
Copy link

It would be great to see this released. It looks very developed to me, and would be useful for us within xarray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments. Feature requests and proposed changes. topic: Static Typing Static typing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants