Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add npt.NDArray, a runtime-subscriptable alias for np.ndarray #18935

Merged
merged 9 commits into from May 17, 2021

Conversation

BvB93
Copy link
Member

@BvB93 BvB93 commented May 7, 2021

This PR introduces a runtime-subscriptable alias for np.ndarray[Any, np.dtype[~Scalar]] by the name of npt.NDArray.

Follow up on #17719, which made the initial attempt at introducing aforementioned type alias before being deferred to a later PR.

Motivation for the introduction of this type alias is two-fold:

  • It allows one to annotate an array with a given dtype during runtime, e.g. the same role typing.Sequence fulfilled for collections.abc.Sequence prior to python 3.9. While runetime-subsription is already, more or less, possible to with from __future__ import annotations, the latter only helps with annotations, and not the creation of type aliases. All in all npt.NDArray would add a big convenience factor.
  • It provides a compact alias for an otherwise rather verbose expression, a convenience that was already extensive utilized in the numpy stubs via its private predecessor (c332c6a). Note this alias cannot be used for custom dtypes that are not based on np.generic (see NEP 42), which will either have to define their own aliases or stick to the complete form.

Future Work

Currently I can think of two other (potential) candidates:

  • An alias for np.dtype[~Scalar]. I'm not 100% sure if the big dtype update (NEP 40-42) already has plans on adding a dtype.__class_getitem__ method or not, so adding a subscriptable alias to numpy.typing might be redudant. @seberg do you have any comments this?
  • An alias for np.ndarray[~Shape, np.dtype[~Scalar]]. I'd say this would be worthwhile to introduce, once we have shape-typing sorted out (Typing support for shapes #16544), that is.
    There is an open question of what would be the best name for such type-alias. npt.Array perhaps? To distinguish it from the arbitrary shaped npt.NDarray introduced in this PR.

Implementation

Fortunately PEP 585 has provided us with all the tools we need for constructing what are effectively subscriptable wrappers: types.GenericAlias. Aforementioned class is designed such that (nearly) all operations wrap around the underlying type, e.g. np.ndarray behaves virtually the same as types.GenericAlias(np.ndarray, ()).

The only bad news is that GenericAlias is exclusive to python >= 3.9 (and written in C), hence the introduction of a (python-based) backport in this PR. The backport can of course be removed once we've dropped support for 3.8 at some point in the future.

Examples

>>> import numpy as np
>>> import numpy.typing as npt

>>> print(npt.NDArray)
numpy.ndarray[typing.Any, numpy.dtype[~ScalarType]]

>>> print(npt.NDArray[np.float64])
numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]]

>>> NDArrayInt = npt.NDArray[np.int_]
>>> a: NDArrayInt = np.arange(10)

>>> def func(a: npt.ArrayLike) -> npt.NDArray[Any]:
...     return np.array(a)

@BvB93 BvB93 added 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes 30 - API Static typing labels May 7, 2021
@seberg
Copy link
Member

seberg commented May 7, 2021

I am happy to just add __class_getitem__, I have not done it yet, simply because it wasn't one of the "big things to worry about". I like the syntax (and it seems very reasonable for typing).

@BvB93 if it helps you, I can add this over the weekend, we ping the mailing list about comments (I doubt anyone will be bothered) and then you have it.

@BvB93
Copy link
Member Author

BvB93 commented May 7, 2021

I am happy to just add __class_getitem__, I have not done it yet, simply because it wasn't one of the "big things to worry about". I like the syntax (and it seems very reasonable for typing).

That would be nice, yes; it's definitely one of the things on my wish lists.
Personally I wouldn't mind waiting for 1.22 either, so no hurries.

@BvB93 BvB93 removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label May 7, 2021
@seberg
Copy link
Member

seberg commented May 7, 2021

@BvB93 what is the typical behviour? Since np.dtype[scalar] would return subclasses of np.dtype, __class_getitem__ should refuse to work on subclasses (e.g. np.dtype[scalar][scalar] makes no sense) right?

EDIT: Or maybe, I could allow it, but raise an error unless: new = DType[something]; assert new != DType and issubclass(new, DType)

@BvB93
Copy link
Member Author

BvB93 commented May 7, 2021

@BvB93 what is the typical behviour? Since np.dtype[scalar] would return subclasses of np.dtype, __class_getitem__ should refuse to work on subclasses (e.g. np.dtype[scalar][scalar] makes no sense) right?

I would actually propose to just stick same semantics as cpython used in PEP 585 for the builtin types (python/cpython#18239). For example, with the likes of builtins.list their pure python implementation would be equivalent the code below:

import types

def __class_getitem__(cls, args):
    return types.GenericAlias(cls, args)

Or a bit more fancy and and some basic error checking (i.e. this will raise for dtype[Any, Any] and dtype[()]):

import types

Scalar = TypeVar("Scalar")
_ALIAS = types.GenericAlias(cls, (Scalar,))

def __class_getitem__(cls, args):
    return _ALIAS[args]  # Let `GenericAlias` handle the error checking

This would have a few consequences though:

  • types.GenericAlias is only available for python >= 3.9. So we could either limit __class_getitem__, or for example use the backport introduced in this PR. I'd be ok with either option.
  • The return type would, during runtime, be a wrapped version of the original class (so not an actual subclass).

@seberg
Copy link
Member

seberg commented May 7, 2021

The return type would, during runtime, be a wrapped version of the original class (so not an actual subclass).

Wait, but that sounds pretty much useless at run-time? I thought this would be useful semantics to get the actual subclass.

@shoyer
Copy link
Member

shoyer commented May 7, 2021

No objections from me -- this looks very handy!

@BvB93
Copy link
Member Author

BvB93 commented May 7, 2021

Wait, but that sounds pretty much useless at run-time? I thought this would be useful semantics to get the actual subclass.

Well... yes.

Going the __class_getitem__(cls, args) -> SubClass route would be an option, but I'd imagine it's more work to implement.
For example, we'd need a way of handling:

  • Typvars: def func(dt: np.dtype[T]) -> T: ...
  • Literal strings: IntDtype: np.dtype["np.int_"]
  • Any: np.dtype[Any]

Then again, I'm not opposing either option.

@seberg
Copy link
Member

seberg commented May 7, 2021

@BvB93 no, its not really any work at all, because I already have that. The point is that this is what happens if you type np.array():

np.array([scalar])

must already be able to effectively do:

DType = np.dtype[type(scalar)]  # EDIT: added missing type.
descriptor = DType.get_descriptor_from_pyobject(scalar)
result = np.empty(1, dtype=descriptor)
result[0] = scalar

In other words: I already have a function that does DType = np.dtype[scalar] in NumPy. (Of course there could be DTypes without scalars in the future, but in that case the user would have to use that DTypes name directly in any case.)

@seberg
Copy link
Member

seberg commented May 7, 2021

Sorry, I hadn't parsed the typing side of what it means to return subclasses... I am not too fond about strings at runtime, but if typing wants it, sure.

From the runtime side of things, I feel it would be natural to have:

SubClassDType = np.dtype[scalar_type]
SubClassDType.type is scalar_type  # we could also call it `scalar_type` or so...

(both of these are guaranteed. I am not 100% sure about the opposite direction that np.dtype[DType.type] must work, but the above must always work.)

That would allow doing things like:

isinstance(arr.dtype, np.dtype[np.float64])

But, of course we probably don't need that all that often...

@BvB93 BvB93 force-pushed the generic-alias branch 3 times, most recently from 25c3425 to 04670a1 Compare May 12, 2021 09:39
@BvB93
Copy link
Member Author

BvB93 commented May 14, 2021

FYI the discussion on dtype.__class_getitem__ from this weeks community meeting seems to have stirred towards the types.GenericAlias-based approach (xref python/cpython#18239). As a reminder, such an enhancement would still be beyond the scope of this particular PR and reserved for future work.

Secondly, I'd like to point the attention of (potential) reviewers to the snippet below, as it would be useful to sort out some of these details already (most importantly: would the names make sense?).

Future Work

  • An alias for np.ndarray[~Shape, np.dtype[~Scalar]]. I'd say this would be worthwhile to introduce, once we have shape-typing sorted out (Typing support for shapes #16544), that is.
    There is an open question of what would be the best name for such type-alias. npt.Array perhaps? To distinguish it from the arbitrary shaped npt.NDarray introduced in this PR.

@BvB93 BvB93 added this to the 1.21.0 release milestone May 14, 2021
@charris charris merged commit 1fcded9 into numpy:main May 17, 2021
@charris
Copy link
Member

charris commented May 17, 2021

Thanks Bas.

@BvB93 BvB93 deleted the generic-alias branch May 17, 2021 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants