Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implementation of the NEP 47 (adopting the array API standard) #18585

Merged
merged 152 commits into from Aug 24, 2021

Conversation

asmeurer
Copy link
Member

This is an implementation of the array API standard, as described in NEP 47.

The implementation is a new submodule named numpy._array_api, which is independent of the rest of NumPy. The idea is to have a compliant implementation of the array API specification built on top of NumPy. Furthermore, the numpy._array_api namespace implements a highly restrictive version of the standard, meaning only those things that are explicitly required by the standard are implemented. This is done so that third party libraries can use the NumPy array API implementation and be confident that their code is not using functionality that may not be implemented in other libraries.

The module is at present pure Python, and serves primarily as wrappers for the corresponding NumPy functions. One thing of note here is that the module has its own ndarray object, which is a wrapper (NOT a subclass) of numpy.ndarray. This is done so that the methods on the array object are only those methods that are required by the spec. It is also done because some behavior in the spec deviates from NumPy, in particular array scalars are not included (only shape () arrays), only a subset indexing is implemented (see numpy._array_api.ndarray._validate_indices), and the casting rules are different (there is no value-based casting. Note: this is still a TODO for me to implement here).

Notes for reviewers:

  • Comments on NEP 47 itself can also be made on this mailing list thread.
  • Please direct feedback on the specifics of the array API specification to the data-apis organization here.
  • I have written a high-level overview of what is being done here in the docstring of numpy/_array_api/__init__.py.
  • I have added # Note: comments everywhere where the array API makes a significant deviation from standard NumPy.
  • There is a work-in-progress test suite for the array API specification here. This will serve as the primary test suite for this module. In addition, I intend to add certain tests to this module for things that are not covered by the array API test suite, for instance, tests that certain NumPy semantics that aren't covered by the spec are correctly disallowed (these are still TODO for me).
  • Device support and DLPack support are not implemented yet because they require implementation in NumPy itself first. This should be done in separate pull requests.

This is based on the function stubs from the array API test suite, and is
currently based on the assumption that NumPy already follows the array API
standard. Now it needs to be modified to fix it in the places where NumPy
deviates (for example, different function names for inverse trigonometric
functions).
…dual files

The specific submodule organization is an implementation detail and should not
be used. Only the top-level numpy._array_api namespace should be used.
This avoids importing everything inside the individual functions, but still is
preferred over importing the functions used explicitly, as most of them clash
with the wrapper function names.
The docstrings just point back to the functions they wrap for now. More
thought may need to be put into this for the future. Most functions can
actually perhaps inherit the docstring of the function they wrap directly, but
there are some functions that have differences (e.g., different names,
different keyword arguments, fewer keyword arguments, etc.). There's also the
question of how to handle cross-references/see alsos that point to functions
not in the API spec and behavior shown in docstring examples that isn't
required in the spec.
This is mostly aimed at any potential reviewers of the module for now.
Some stubs still need to be modified to properly pass mypy type checking.
Also, 'device' is just left as a TypeVar() for now.
…eturn a scalar

This is needed to pass mypy type checks for the given type annotations.
…bmodule

That way they only work on actual ndarray inputs, not array-like, which is
more inline with the spec.
So far, it just is a wrapper with all the methods defined in the spec, which
all pass through. The next step is to make it so that the methods that behave
differently actually work as the spec describes. We also still need to modify
all the array_api functions to return this wrapper object instead of
np.ndarray.
…ndarray class

These methods aren't required by the spec, but without them, the array object
is harder to use interactively.
@charris
Copy link
Member

charris commented Aug 18, 2021

I've dropped the Python 3.7 tests, let's try this again.

@asmeurer
Copy link
Member Author

Oh that simplifies things. Should I remove the 3.7 compatibility code here?

@charris
Copy link
Member

charris commented Aug 18, 2021

Should I remove the 3.7 compatibility code here?

Yes.

@asmeurer
Copy link
Member Author

Will do. I am taking time off this week so I might not get to it until next week. Also I just realized that I forgot the changelog entry.

rgommers pushed a commit to data-apis/array-api that referenced this pull request Aug 19, 2021
* Fix alphabetical function ordering in the linear algebra extension

* Update some type hints in the spec

These come from comments on the NumPy pull request
numpy/numpy#18585.

* Clarify that the where() result type is only based on the last two arguments

* Use Literal[] for the qr() mode argument type hint

* Add bool and int to the asarray type hints

* Fix a typo in the __setitem__ annotations
asmeurer added a commit to data-apis/numpy that referenced this pull request Aug 23, 2021
The NEP has been updated and expanded to more accurately correspond to the
implementation in numpy#18585, as well as various wording and grammar fixes.
NumPy has dropped Python 3.7, so these are no longer necessary (and they
didn't completely work anyway).
@asmeurer
Copy link
Member Author

I removed the Python 3.7 checks here, and added a release notes entry. This should be ready for final review.

@charris charris removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Aug 23, 2021
numpy/_pytesttester.py Outdated Show resolved Hide resolved
There is a test that fails in the presence of simplefilter('ignore')
(test_warnings.py). catch_warnings(record=True) seems to be a way to get the
same behavior without failing the test.
@charris charris merged commit 098f874 into numpy:main Aug 24, 2021
@charris
Copy link
Member

charris commented Aug 24, 2021

Thanks @asmeurer . We may want to do more formatting in the future, but that can wait.

Comment on lines +6 to +8
if TYPE_CHECKING:
from ._typing import (
Array,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @asmeurer, I am debugging a Sphinx issue and you might know better in the typing system: I thought in normal Python interpreter TYPE_CHECKING is false, meaning Array et al are not imported. But throughout this module Array is used (ex: Array._new()) -- how is it possible?

Elsewhere, Array is imported from _array_object directly, instead of from _typing. I suspect this leads to early expansion of the Array type (compare this function and this function: the former leaks the implementation detail), and I believe applying the treatment in this _creation_functions module would fix the problem. I just don't understand why we conditionally import here but things still work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry nvm, I figured out my silliness: Array is imported in every function 😅

asmeurer added a commit to asmeurer/numpy that referenced this pull request Sep 23, 2021
This method is new since numpy#18585. It does nothing in NumPy since NumPy does not
support non-CPU devices.
charris pushed a commit that referenced this pull request Sep 25, 2021
* Add __index__ to array_api and update __int__, __bool__, and __float__

The spec specifies that they should only work on arrays with corresponding
dtypes. __index__ is new in the spec since the initial PR, and works
identically to np.array.__index__.

* Add the to_device method to the array_api

This method is new since #18585. It does nothing in NumPy since NumPy does not
support non-CPU devices.

* Update transpose methods in the array_api

transpose() was renamed to matrix_transpose() and now operates on stacks of
matrices. A function to permute dimensions will be added once it is finalized
in the spec. The attribute mT was added and the T attribute was updated to
only operate on 2-dimensional arrays as per the spec.

* Restrict input dtypes in the array API statistical functions

* Add the dtype parameter to the array API sum() and prod()

* Add the function permute_dims() to the array_api namespace

permute_dims() is the replacement for transpose(), which was split into
permute_dims() and matrix_transpose().

* Add tril and triu to the array API namespace

* Fix the array_api Array.__repr__ to indent the array properly

* Make the Device type in the array_api just accept the string "cpu"
howjmay pushed a commit to howjmay/numpy that referenced this pull request Sep 29, 2021
* Add __index__ to array_api and update __int__, __bool__, and __float__

The spec specifies that they should only work on arrays with corresponding
dtypes. __index__ is new in the spec since the initial PR, and works
identically to np.array.__index__.

* Add the to_device method to the array_api

This method is new since numpy#18585. It does nothing in NumPy since NumPy does not
support non-CPU devices.

* Update transpose methods in the array_api

transpose() was renamed to matrix_transpose() and now operates on stacks of
matrices. A function to permute dimensions will be added once it is finalized
in the spec. The attribute mT was added and the T attribute was updated to
only operate on 2-dimensional arrays as per the spec.

* Restrict input dtypes in the array API statistical functions

* Add the dtype parameter to the array API sum() and prod()

* Add the function permute_dims() to the array_api namespace

permute_dims() is the replacement for transpose(), which was split into
permute_dims() and matrix_transpose().

* Add tril and triu to the array API namespace

* Fix the array_api Array.__repr__ to indent the array properly

* Make the Device type in the array_api just accept the string "cpu"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants