New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement the DLPack Array API protocols for ndarray. #19083
Conversation
d111d23
to
248a695
Compare
Thanks Hameer, looks like a good start. Of the points @seberg brought up on the issue, adding a version attribute seems the most important - because checking for version will be needed before any backwards-incompatible change (such as adding an extra field) can be done. I asked about it on dmlc/dlpack#34, and the suggestion was to add it as an attribute on the |
ac6c005
to
5aa28d8
Compare
Sure -- I commented there instead of here, so more of the relevant parties can see it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Hameer, could you also add these two new methods to the ndarray
stubs in numpy/__init__.pyi
?
Their signatures are fairly simple, so that's a plus:
# NOTE: Every single import below is already present in `__init__.pyi`
# and doesn't have to be repeated
from typing import Any, Tuple
from typing_extensions import Literal as L
from numpy import number
from numpy.typing import NDArray
# `builtins.PyCapsule` unfortunately lacks annotations as of the moment;
# use `Any` as a stopgap measure
_PyCapsule = Any
def __dlpack__(self: NDArray[number[Any]], *, stream: None = ...) -> _PyCapsule: ...
def __dlpack_device__(self) -> Tuple[L[1], L[0]]: ...
77e49cc
to
830903a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had started looking at it, so a few small comments. It looks good, I have to think a bit more about the API, but that has nothing to do with the implementation.
I think if we put the header where you put it, it will be effectively public. I guess that is fine thoug? Otherwise I think placing it into common
is good.
# `builtins.PyCapsule` unfortunately lacks annotations as of the moment; | ||
# use `Any` as a stopgap measure | ||
_PyCapsule = Any | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, if we could get builtins.PyCapsule
in typeshed annotated as a parameterizable type,
then it would in principle be possible for static type checkers to read, e.g., the underlying shape and dtype.
The only thing that users or libraries would have to do here is declare the necessary annotations.
Perhaps something to consider for the future?
Examples
from typing import TypeVar, Any, Generic, Tuple as Tuple
import numpy as np
# Improvised `PyCapsule` annotation
_T = TypeVar("_T")
class PyCapsule(Generic[_T]): ...
# Construct a more compact `PyCapsule` alias; `Tuple` used herein to introduce 2 parameters
# (there may be more appropriate types that can fulfill this functionality)
_Shape = TypeVar("_Shape", bound=Any) # TODO: Wait for PEP 646's TypeVarTuple
_DType = TypeVar("_DType", bound=np.dtype[Any])
DLPackCapsule = PyCapsule[Tuple[_Shape, _DType]]
# A practical example
def from_dlpack(__x: DLPackCapsule[_Shape, _DType]) -> np.ndarray[_Shape, _DType]: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll consider this as out of scope of this PR for now, but will leave the conversation unresolved for visibility.
a5d1adf
to
becdf4d
Compare
There's a standardized name, see data-apis/array-api#186. I have modified the PR accordingly.
Unaligned is OK, see #19013 (comment). How does one check for native dtypes?
Can we verify this? |
835c638
to
e6d2195
Compare
Ignore, it does mean IEEE float. Does |
|
I guess if longdouble is 64bit, you can be sure its just double. That we could check for if we want. There are a couple of macros in Technically, most current hardware uses something like 80bit (IEEE?) stored as 96bit or 128bit. But DLPack can't describe it (its an old thing that our |
You can use |
No, it has nothing to do with stream handling (which is done at the Python level during handshaking), it's about keeping the ref count of the underlying resource. Conceptually we could definitely use a capsule to back an ndarray, in practice though |
Not really the right place to continue discussing. But no: The capsule owns the |
numpy/core/src/multiarray/dlpack.c
Outdated
"Offsets not clearly supported by this " | ||
"version of DLPack."); | ||
return NULL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path can only be hit if we are re-wrapping a dlpack capsule with an offset. It is currently not tested, which should be done in a follow-on PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, don't worry about it. This is nonsense anyway, I deleted the whole thing (the logic was probably incomplete anyway) and replaced it with the following comment:
/*
* Note: the `dlpack.h` header suggests/standardizes that `data` must be
* 256-byte aligned. We ignore this intentionally, because `__dlpack__`
* standardizes that `byte_offset` must be 0 (for now) to not break pytorch:
* https://github.com/data-apis/array-api/issues/293#issuecomment-964111413
*
* We further assume that exporting fully unaligned data is OK even without
* `byte_offset` since the standard does not reject it.
* Presumably, pytorch will support importing `byte_offset != 0` and NumPy
* can choose to use it starting about 2023. At that point, it may be
* that NumPy MUST use `byte_offset` to adhere to the standard (as
* specified in the header)!
*/
If `byte_offset = 0` is forced anyway, there is no point in trying to preserve a previous `data` information from the capsule. (And probably it should have used a base array also, and not just a base DLPack capsule, anyway.)
I will put this in once the tests pass, but will then open a few issues about dlpack, with pretty harmless bugs or general followups. The release notes get reviewed before the release anyway. Thanks everyone and especially Matti for doing the follow-ups. |
🎉 thanks @mattip, @hameerabbasi, @seberg, @leofang & all other reviewers! |
Silly question: Can someone remind me why we renamed it to |
This was pushed in a ~1 week before branching with not quite settled things in dlpack itself (as I remember). Adding the underscore was the way to not having to discuss all of that in a short time frame. |
I see, thanks for the context @seberg. Sounds like a decision made offline (which I am fine with). Is there any plan to make it a public API (by removing the underscore)? |
Yes, we can discuss this obviously. It may have been discussed in a meeting, in which case it is publically noted in the |
The only mention of dlpack in the docs is in the release notes, so maybe we could discuss making it public together with some documentation. |
Thanks, guys. I created an issue to track this change: #20743. |
Fixes #19013
cc @rgommers @mattip @seberg
TODO
from_dlpack
function.__dlpack__
TODO items added/edited by seberg:
methods.c
dlpack.h
that clarifies alignment requirements (and use ofbyte_offset
). After such a PR exists, we can review the current rules when to reject export. (if the rejection is not guaranteed to be fixed byarr.copy()
we may have a problem.)from_dlpack
shall not be exposed as new, public API. There will be no public API to convert to NumPy from__dlpack__
. It would be ok to have experimental/private, underscored API.dlpack.h
or array-api that states that imported arrays shall be considered writable and exporters like NumPy should avoid exporting readonly arrays (truly readonly only?). Ideally, with a footnote that this means that users of immutable array libraries may be up for a big surprise.(mattip says:) Not sure how to resolve this. The data api issue is still open
->deleter == NULL
case is settled (I think it is, but maybe there needs to be some clarification in the header as well).