ENH: Implement the DLPack Array API protocols for ndarray. #19083

hameerabbasi · 2021-05-24T13:26:07Z

Fixes #19013

cc @rgommers @mattip @seberg

TODO

from_dlpack function.
dtype support in __dlpack__

TODO items added/edited by seberg:

Move all dlpack specific code to a single C file, rather than methods.c
I would feel much better if there is a PR to dlpack.h that clarifies alignment requirements (and use of byte_offset). After such a PR exists, we can review the current rules when to reject export. (if the rejection is not guaranteed to be fixed by arr.copy() we may have a problem.)
As discussed in the meeting from_dlpack shall not be exposed as new, public API. There will be no public API to convert to NumPy from __dlpack__. It would be ok to have experimental/private, underscored API.
(preferably) Make sure there is a PR to either dlpack.h or array-api that states that imported arrays shall be considered writable and exporters like NumPy should avoid exporting readonly arrays (truly readonly only?). Ideally, with a footnote that this means that users of immutable array libraries may be up for a big surprise.
(mattip says:) Not sure how to resolve this. The data api issue is still open
I am not quite sure if the ->deleter == NULL case is settled (I think it is, but maybe there needs to be some clarification in the header as well).
Specify cleanup to know for a fact whether this implementation or the CuPy implementation is correct.

rgommers · 2021-05-24T15:31:48Z

Thanks Hameer, looks like a good start.

Of the points @seberg brought up on the issue, adding a version attribute seems the most important - because checking for version will be needed before any backwards-incompatible change (such as adding an extra field) can be done. I asked about it on dmlc/dlpack#34, and the suggestion was to add it as an attribute on the PyCapsule. Could you look into doing that?

numpy/core/src/multiarray/methods.c

hameerabbasi · 2021-05-24T15:55:00Z

Could you look into doing that?

Sure -- I commented there instead of here, so more of the relevant parties can see it.

BvB93

Hi Hameer, could you also add these two new methods to the ndarray stubs in numpy/__init__.pyi?
Their signatures are fairly simple, so that's a plus:

# NOTE: Every single import below is already present in `__init__.pyi`
# and doesn't have to be repeated
from typing import Any, Tuple
from typing_extensions import Literal as L

from numpy import number
from numpy.typing import NDArray

# `builtins.PyCapsule` unfortunately lacks annotations as of the moment;
# use `Any` as a stopgap measure
_PyCapsule = Any

def __dlpack__(self: NDArray[number[Any]], *, stream: None = ...) -> _PyCapsule: ...
def __dlpack_device__(self) -> Tuple[L[1], L[0]]: ...

numpy/core/src/multiarray/methods.c

seberg

Had started looking at it, so a few small comments. It looks good, I have to think a bit more about the API, but that has nothing to do with the implementation.

I think if we put the header where you put it, it will be effectively public. I guess that is fine thoug? Otherwise I think placing it into common is good.

numpy/core/src/multiarray/methods.c

BvB93 · 2021-05-24T16:46:38Z

numpy/__init__.pyi

+# `builtins.PyCapsule` unfortunately lacks annotations as of the moment;
+# use `Any` as a stopgap measure
+_PyCapsule = Any
+


Interestingly, if we could get builtins.PyCapsule in typeshed annotated as a parameterizable type,
then it would in principle be possible for static type checkers to read, e.g., the underlying shape and dtype.

The only thing that users or libraries would have to do here is declare the necessary annotations.

Perhaps something to consider for the future?

Examples

from typing import TypeVar, Any, Generic, Tuple as Tuple import numpy as np # Improvised `PyCapsule` annotation _T = TypeVar("_T") class PyCapsule(Generic[_T]): ... # Construct a more compact `PyCapsule` alias; `Tuple` used herein to introduce 2 parameters # (there may be more appropriate types that can fulfill this functionality) _Shape = TypeVar("_Shape", bound=Any) # TODO: Wait for PEP 646's TypeVarTuple _DType = TypeVar("_DType", bound=np.dtype[Any]) DLPackCapsule = PyCapsule[Tuple[_Shape, _DType]] # A practical example def from_dlpack(__x: DLPackCapsule[_Shape, _DType]) -> np.ndarray[_Shape, _DType]: ...

I'll consider this as out of scope of this PR for now, but will leave the conversation unresolved for visibility.

hameerabbasi · 2021-05-25T11:29:21Z

This name should be some standardized DLPack name? Or is this intentional for now to keep it effectively private right now?

Well, we have to be able to delete it to free the memory, but maybe we need some state as well, to ensure that we don't delete it twice if the consumer already deletes it without deleting the capsule (might be good for non-reference counted Python also?).

I thought I read that there was some capsule renaming involved (to signal that its "invalid" after deletion). I guess setting the data to NULL or so would be just as well. (If this is standardized, do that. If not setting data to NULL to guard against deleting twice? Once from python once from the consumer.)

There's a standardized name, see data-apis/array-api#186. I have modified the PR accordingly.

We need to also check for the DType being native. I assume unaligned is OK?

Unaligned is OK, see #19013 (comment). How does one check for native dtypes?

Maybe we should just explicitly list them? We cannot include longdouble here, since the C99 standard doesn't specify it (I am not even sure IBM double-double is strictly standard conform). And I assume Float in DLPack means IEEE compatible float format of defined size. float16, float32, and float64 are fine, for the others maybe DLPack should add an extended float field or so, if we want it?

I just don't see longdoublel to be much good, considering its already system dependend on the CPU alone...

Can we verify this?

hameerabbasi · 2021-05-25T12:32:59Z

Can we verify this?

Ignore, it does mean IEEE float. Does np.float80 or np.float128 not respect the IEEE convention? In that case, it makes sense to limit both complex and floats. Otherwise, the receiving library can deny the incoming datatype by reading the bits field.

eric-wieser · 2021-05-25T12:59:50Z

np.float80 etc are all just aliases for np.longdouble based on what C produces for sizeof(long double)

seberg · 2021-05-25T14:31:18Z

I guess if longdouble is 64bit, you can be sure its just double. That we could check for if we want. There are a couple of macros in numpy/core/src/common/npy_fpmath.h such as HAVE_LDOUBLE_IEEE_QUAD_BE. I don't know how well they work/are tested, but I guess technically you could check for that to catch the rare cases where we have proper quad precision.

Technically, most current hardware uses something like 80bit (IEEE?) stored as 96bit or 128bit. But DLPack can't describe it (its an old thing that our float128 should relaly be float128_80 or something else than float128...). Something, that will eventually be important if quad precision becomes a real thing.

seberg · 2021-05-25T14:35:24Z

How does one check for native dtypes?

You can use PyDataType_ISNOTSWAPPED.

leofang · 2021-11-03T03:28:51Z

clear hierarchal structure (ndarray <- DLPackMemory <- capsule)

I guess you can see it that way, but I see no reason for DLPackMemory to hold on to the original capsule! At least unless there is some reasons due to stream handling.

No, it has nothing to do with stream handling (which is done at the Python level during handshaking), it's about keeping the ref count of the underlying resource. Conceptually we could definitely use a capsule to back an ndarray, in practice though cupy.ndarray expects a few extra methods from its data member to return device, size, pointer, etc, hence wrapping the capsule with a very thin layer (DLPackMemory).

seberg · 2021-11-03T13:47:09Z

Not really the right place to continue discussing. But no: The capsule owns the DLManagedTensor which owns its own memory and is not reference counted (there is exactly one reference that cannot be shared). Renaming the capsule steals ownership from it (and you must rename it).
At that point, the capsule does not own anything (check the capsule destructor it does nothing after renaming). So you simply can delete the cdef object dltensor and self.dltensor = dltensor line. The only tricky thing is memory handling (i.e. renaming the capsule at the right time), and I think cupy should move that directly after the line getting the pointer from the capsule (at that point DLPackMemory has ownership). That looks like a bug, for example if you are confronted with an unknown dtype.

mattip · 2021-11-09T09:19:08Z

numpy/core/src/multiarray/dlpack.c

+                        "Offsets not clearly supported by this "
+                        "version of DLPack.");
+        return NULL;
+    }


This code path can only be hit if we are re-wrapping a dlpack capsule with an offset. It is currently not tested, which should be done in a follow-on PR.

Yeah, don't worry about it. This is nonsense anyway, I deleted the whole thing (the logic was probably incomplete anyway) and replaced it with the following comment:

/* * Note: the `dlpack.h` header suggests/standardizes that `data` must be * 256-byte aligned. We ignore this intentionally, because `__dlpack__` * standardizes that `byte_offset` must be 0 (for now) to not break pytorch: * https://github.com/data-apis/array-api/issues/293#issuecomment-964111413 * * We further assume that exporting fully unaligned data is OK even without * `byte_offset` since the standard does not reject it. * Presumably, pytorch will support importing `byte_offset != 0` and NumPy * can choose to use it starting about 2023. At that point, it may be * that NumPy MUST use `byte_offset` to adhere to the standard (as * specified in the header)! */

If `byte_offset = 0` is forced anyway, there is no point in trying to preserve a previous `data` information from the capsule. (And probably it should have used a base array also, and not just a base DLPack capsule, anyway.)

seberg · 2021-11-09T19:15:28Z

I will put this in once the tests pass, but will then open a few issues about dlpack, with pretty harmless bugs or general followups. The release notes get reviewed before the release anyway.

Thanks everyone and especially Matti for doing the follow-ups.

rgommers · 2021-11-09T19:35:04Z

🎉 thanks @mattip, @hameerabbasi, @seberg, @leofang & all other reviewers!

leofang · 2022-01-03T16:28:23Z

Silly question: Can someone remind me why we renamed it to np._from_dlpack() (prefixed with an underscore)?

seberg · 2022-01-04T18:31:18Z

This was pushed in a ~1 week before branching with not quite settled things in dlpack itself (as I remember). Adding the underscore was the way to not having to discuss all of that in a short time frame.

leofang · 2022-01-04T18:38:38Z

I see, thanks for the context @seberg. Sounds like a decision made offline (which I am fine with). Is there any plan to make it a public API (by removing the underscore)?

seberg · 2022-01-04T19:07:29Z

Yes, we can discuss this obviously. It may have been discussed in a meeting, in which case it is publically noted in the numpy/archive repo. I feel the PR had the underscore initially, because the first step was to make it available for the _array_api namespace.

mattip · 2022-01-04T19:14:53Z

The only mention of dlpack in the docs is in the release notes, so maybe we could discuss making it public together with some documentation.

leofang · 2022-01-05T21:23:40Z

Thanks, guys. I created an issue to track this change: #20743.

hameerabbasi requested review from seberg and mattip May 24, 2021 13:26

hameerabbasi added 01 - Enhancement component: numpy._core labels May 24, 2021

hameerabbasi changed the title ~~Add the __dlpack__ and __dlpack_device__ methods to ndarray.~~ Implement the DLPack Array API protocols for ndarray. May 24, 2021

hameerabbasi force-pushed the dlpack branch 3 times, most recently from d111d23 to 248a695 Compare May 24, 2021 14:39

hameerabbasi force-pushed the dlpack branch 2 times, most recently from ac6c005 to 5aa28d8 Compare May 24, 2021 15:50

hameerabbasi commented May 24, 2021

View reviewed changes

numpy/core/src/multiarray/methods.c Outdated Show resolved Hide resolved

BvB93 reviewed May 24, 2021

View reviewed changes

numpy/core/src/multiarray/methods.c Outdated Show resolved Hide resolved

numpy/core/src/multiarray/methods.c Outdated Show resolved Hide resolved

hameerabbasi force-pushed the dlpack branch 2 times, most recently from 77e49cc to 830903a Compare May 24, 2021 16:12

seberg reviewed May 24, 2021

View reviewed changes

BvB93 reviewed May 24, 2021

View reviewed changes

hameerabbasi force-pushed the dlpack branch 2 times, most recently from a5d1adf to becdf4d Compare May 25, 2021 11:24

hameerabbasi force-pushed the dlpack branch 4 times, most recently from 835c638 to e6d2195 Compare May 25, 2021 11:41

hameerabbasi requested a review from seberg May 25, 2021 12:33

mattip force-pushed the dlpack branch from b98fc4a to ba8fcbe Compare November 2, 2021 09:36

mattip added the triage review Issue/PR to be discussed at the next triage meeting label Nov 3, 2021

mattip added 2 commits November 4, 2021 00:11

make a.__dlpack__() fail if a is readonly

c38b7f3

add release note, error out if offset is used

f6076ee

mattip reviewed Nov 9, 2021

View reviewed changes

seberg force-pushed the dlpack branch from 71ed821 to 5b94a03 Compare November 9, 2021 18:16

seberg merged commit 0de29c5 into numpy:main Nov 9, 2021

seberg mentioned this pull request Nov 9, 2021

TST,MAINT: DLPack could use some additional tests and cleanups #20339

Closed

BvB93 mentioned this pull request Nov 26, 2021

BUG: Fix two overload-related problems #20468

Merged

charris mentioned this pull request Nov 27, 2021

BUG: Fix two overload-related problems #20473

Merged

leofang mentioned this pull request Jan 5, 2022

ENH: Make from_dlpack() a public API under the main module #20743

Closed

leofang mentioned this pull request Jan 5, 2022

BUG: DLPack cannot export readonly buffers and raises wrong exception #20742

Closed

vnlitvinov mentioned this pull request Mar 21, 2022

FEAT-#4244: Implement dataframe exchange protocol for OmniSci modin-project/modin#4269

Merged

8 tasks

YarShev mentioned this pull request Apr 1, 2022

ENH: Implement DataFrame interchange protocol pandas-dev/pandas#46141

Merged

4 tasks

xadupre mentioned this pull request Apr 11, 2022

Move OrtValueVector from onnxruntime-training to onnxruntime microsoft/onnxruntime#11176

Merged

ar90n mentioned this pull request Oct 19, 2022

Support from_dlpack from numpy.NDArray cupy/cupy#7131

Closed

shadeMe mentioned this pull request Nov 1, 2022

Crash with numpy >=1.23.4 and torch 1.13.0 due to DLpack interaction pytorch/pytorch#88082

Closed

This was referenced Sep 5, 2023

Add an Array Protocol & improve static typing support data-apis/array-api#589

Open

BUG: array_api stream has wrong type #24642

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Implement the DLPack Array API protocols for ndarray. #19083

ENH: Implement the DLPack Array API protocols for ndarray. #19083

hameerabbasi commented May 24, 2021 •

edited by seberg

rgommers commented May 24, 2021

hameerabbasi commented May 24, 2021

BvB93 left a comment

seberg left a comment

BvB93 May 24, 2021 •

edited

hameerabbasi May 31, 2021

hameerabbasi commented May 25, 2021 •

edited

hameerabbasi commented May 25, 2021 •

edited

eric-wieser commented May 25, 2021

seberg commented May 25, 2021 •

edited

seberg commented May 25, 2021

leofang commented Nov 3, 2021

seberg commented Nov 3, 2021

mattip Nov 9, 2021

seberg Nov 9, 2021

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

leofang commented Jan 3, 2022

seberg commented Jan 4, 2022

leofang commented Jan 4, 2022

seberg commented Jan 4, 2022

mattip commented Jan 4, 2022

leofang commented Jan 5, 2022

ENH: Implement the DLPack Array API protocols for ndarray. #19083

ENH: Implement the DLPack Array API protocols for ndarray. #19083

Conversation

hameerabbasi commented May 24, 2021 • edited by seberg

rgommers commented May 24, 2021

hameerabbasi commented May 24, 2021

BvB93 left a comment

Choose a reason for hiding this comment

seberg left a comment

Choose a reason for hiding this comment

BvB93 May 24, 2021 • edited

Choose a reason for hiding this comment

Examples

hameerabbasi May 31, 2021

Choose a reason for hiding this comment

hameerabbasi commented May 25, 2021 • edited

hameerabbasi commented May 25, 2021 • edited

eric-wieser commented May 25, 2021

seberg commented May 25, 2021 • edited

seberg commented May 25, 2021

leofang commented Nov 3, 2021

seberg commented Nov 3, 2021

mattip Nov 9, 2021

Choose a reason for hiding this comment

seberg Nov 9, 2021

Choose a reason for hiding this comment

seberg commented Nov 9, 2021

rgommers commented Nov 9, 2021

leofang commented Jan 3, 2022

seberg commented Jan 4, 2022

leofang commented Jan 4, 2022

seberg commented Jan 4, 2022

mattip commented Jan 4, 2022

leofang commented Jan 5, 2022

hameerabbasi commented May 24, 2021 •

edited by seberg

BvB93 May 24, 2021 •

edited

hameerabbasi commented May 25, 2021 •

edited

hameerabbasi commented May 25, 2021 •

edited

seberg commented May 25, 2021 •

edited