Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: numpy.typing for type checking, documentation and Numpy compilers #26380

Open
paugier opened this issue May 3, 2024 · 1 comment
Open

Comments

@paugier
Copy link

paugier commented May 3, 2024

Proposed new feature or change:

Type annotations in code using Numpy (and other Python array libraries) can be used for 3 purposes:

  • Python-Numpy compilers (for example Cython or Pythran)
  • documentation
  • type checking

Currently numpy.typing is more oriented towards type checking. It would be nice if numpy.typing could also be used to easily add information useful for documentation and Python-Numpy compilers. It seems to me that the needs are a bit different from what currently supports Mypy.

For some projects (namely fluidsim, fluidfft, fluidimage), we already use type annotations so that Transonic can automatically produce Pythran, Numba and Cython code. For these projects, I now often feel the need to add type annotations only for documentation even for functions/classes that are not compiled. Unfortunately, numpy.typing is really not yet suitable for theses needs.

Moreover, I see some discussions about enhancing numpy.typing (for example #16544, see also https://github.com/ramonhagenaars/nptyping) and the solutions proposed seem quite complicated and not very suitable for documentation and Python-Numpy compilers. I mean I see nothing simple and short for something like Array["2d", Type(np.float32, np.float64), "C"] (I guess one can guess what it means).

For these purposes (doc and compilers), some very common type information that can be given are about

  1. the number of dimensions of an array (sometimes fused, i.e. ndim 2 or 3),
  2. dtypes (sometimes fused, i.e. float64 or complex128) and
  3. memory contiguity/strides.

It is also very useful and common to specify that a function is limited to some particular arrays (only C contiguous for example, or only ndim equal to 2 or 3).

Specifying the number of elements in one dimension (#16544) can also be useful but it is less common that specifying the number of dimensions of an array.

Numpy compilers have their own way to describe arrays, often inspired by C notations:

With Transonic, one can use annotations with C or Python styles,

from transonic import Array, Type, NDim

A2D = "float32[:,:]"
# equivalent
A2Dbis = Array["2d", np.float32]

Afused = Array[NDim(2, 3), Type(np.float32, np.float64)]

I'm not saying that numpy.typing should support such things but it seems to me that it is important when designing numpy.typing to consider the different purposes of type annotations in code using Numpy and not to be mostly focus on what is currently supported by Mypy.

Simple things like specifying that an array is a one or two-dimensional array of float64 should be simple and short with numpy.typing.

I add a short real life example about only documenting code. In Fluidimage, I recently wrote when I rediscovered and refactored code written by other developers:

class ThinPlateSplineSubdom:

    num_centers: int
    tps_matrices: List["float[:,:]"]
    norm_coefs: "float[:]"
    norm_coefs_domains: List["float[:]"]

    num_new_positions: int
    ind_new_positions_domains: List["np.int64[:]"]
    norm_coefs_new_pos: "float[:]"
    norm_coefs_new_pos_domains: List["float[:]"]

It would be nice if I could replace that by elegant annotations using numpy.typing.

@rgommers
Copy link
Member

Thanks for the nice write-up and suggestions @paugier. I completely agree with the gist of what you wrote, and would like type annotations to be useful for documentation and Python compilers as well.

I mean I see nothing simple and short for something like Array["2d", Type(np.float32, np.float64), "C"] (I guess one can guess what it means).

Dtype parametrization exists:

import numpy as np
import numpy.typing as npt

def func_return_float64(x: npt.NDArray[Any]) -> npt.NDArray[np.float64]:
    # more complex combinations of allowed dtype inputs or outputs are also supported
    return x.astype(np.float64)

Shape support was blocked until very recently, and there's a lot of interest in (and relevant discussion on) gh-16544. So hopefully this will materialize soon.

Contiguity is very much a niche special case compared to shape and dtype, so let's leave that one aside for now. It shouldn't be hard, but also it's something that end user code shouldn't have to worry about in 99.x% of cases (yes, some compilers do, but that's internals).

For these projects, I now often feel the need to add type annotations only for documentation even for functions/classes that are not compiled. Unfortunately, numpy.typing is really not yet suitable for theses needs.

It should be, although the limitations are often on the Sphinx side. The typical problem is that for type annotations to be correct, they should contain unions and protocols that are complex. For documentation purposes, what is needed is really solid support for type aliases so that the ugliness of the large unions gets hidden correctly, and you could have things like x : ArrayLike | int | float render as the understandable type in html docs.

Simple things like specifying that an array is a one or two-dimensional array of float64 should be simple and short with numpy.typing.

Agreed. It's be great if it looked something like:

  • NDArray[np.float64, Any]
  • NDArray[np.float64, npt.3D]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants