Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Reduce allocation size of empty (0 size) arrays to 1 byte #21477

Merged
merged 9 commits into from
May 17, 2022

Conversation

mattip
Copy link
Member

@mattip mattip commented May 9, 2022

Restart #15788 from @eric-wieser.

*/
if (nbytes == 0) {
nbytes = descr->elsize ? descr->elsize : 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure all the strides are set to zero to avoid any concerns of undefined behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but array_strides_set will allow resetting the strides to any values.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the following behavior when a 0-size array is reshaped in NumPy 1.23:

>>> import numpy as np
>>> a = np.ones((2, 0, 3))
>>> a.strides
(0, 0, 0)
>>> b = a.reshape((3, 0, 2))
>>> b.strides
(16, 16, 8)

Do you think reshape should be also fixed to return b with strides (0, 0, 0) here, @eric-wieser @mattip ? Thanks!

ref. cupy/cupy#6807

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does look like a bug. Please open a new issue for this, the PR here was merged and further discussion will get lost.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I opened #21918.

@seberg
Copy link
Member

seberg commented May 9, 2022

I am wondering if we should worry about limiting the total size also in this case. But I cannot think of any place where it would actually be used (i.e. we assume that a subset of axis cannot overflow when combined).

So, I would feel a bit safer to keep that check unmodified, but I am OK to not worry about it now.

@mattip
Copy link
Member Author

mattip commented May 9, 2022

this case, that check

which ones?

@seberg
Copy link
Member

seberg commented May 9, 2022

Sorry, the current loop ignores 0s, so if you create an array of shape (0, 2**60, 2**60) it currently fails just like (2**60, 2**60, 0) fails.
With the change, I think this will not happen anymore? That should probably be OK, but I am wondering if there could be operations that merge the nonzero dimensions and then overflow.

@mattip
Copy link
Member Author

mattip commented May 10, 2022

Sorry, the current loop ignores 0s, so if you create an array of shape (0, 2**60, 2**60) it currently fails just like (2**60, 2**60, 0) fails.

I restored this behaviour.

There were a few places in the code that assumed strides[-1] == elsize, so I wonder if this change will break downstream code.

@eric-wieser
Copy link
Member

There were a few places in the code that assumed strides[-1] == elsize, so I wonder if this change will break downstream code.

I'd hoped this was the type of thing that RELAXED_STRIDES_CHECKING was supposed to transition downstream code away from.

@seberg
Copy link
Member

seberg commented May 10, 2022

Thanks for restoring the max-size thing!

Yeah, RELAXED_STRIDES should have caught most of such issues. In this case, the array is empty, so using its stride is pretty unlikely. In practice, I expect it only affects thingse if you later grow the array and assume the strides are still compatible.
Growing an array from empty to non-empty is hopefully exceedingly rare or non-existing outside of NumPy proper.

@seberg
Copy link
Member

seberg commented May 11, 2022

@mattip I just realized, I had that PyArray_NBYTES_ALLOCATED helper for the reduction code. I am a bit surprised the reduce tests don't print the mismatch stuff now due to that not also being updated?
Not worried though, but it would still be good to update that inline-function.

EDIT: I may just push that update here later though.

We have to reset all strides in the case of a subarray dtype which
may have caused the result array to not be 1-D.  Since the logic
is a bit complicated now, restore the logic to only do this when it
really is necessary.
@mattip
Copy link
Member Author

mattip commented May 11, 2022

I think the windows error path is

import numpy.core.multiarray as _nx, numpy as np
def ndim(a):
    return np.asarray(a).ndim

dt = np.float64
delta = 1.0
start = 0.0
num = 0

y = _nx.arange(0, num, dtype=dt).reshape((-1,) + (1,) * ndim(delta))
y = y * delta
y += start

which gives "RuntimeWarning: invalid value encountered in add". After arange, y.strides == (8,). After y = y * delta, y.strides == (0,)

@seberg
Copy link
Member

seberg commented May 11, 2022

Ah, this is probably an invalid access in our sum implementation. The reduce path would be taken here with n == 0 (unless we prevent that earlier, and I do not think we do):

NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(@TYPE@_@kind@)
(char **args, npy_intp const *dimensions, npy_intp const *steps, void *NPY_UNUSED(func))
{
if (IS_BINARY_REDUCE) {
#if @PW@
@type@ * iop1 = (@type@ *)args[0];
npy_intp n = dimensions[0];
*iop1 @OP@= @TYPE@_pairwise_sum(args[1], n, steps[1]);
#else

That path clearly always loads one data-point from the output (even for n == 0), and will do an operation for said data point. Even if that operation doesn't generate a NaN (which seems unlikely), that would give a warning if the loaded garbage happened to be a signalling NaN.

EDIT: Maybe all reduce checks should have an N >= 1 check?

@seberg
Copy link
Member

seberg commented May 11, 2022

So for the windows thing, I think the correct fix (which is a bug-fix in any case), is to never do an empty reduce-loop.

There are two ways to do that:

  1. Modify the IS_BINARY_REDUCE macro to reject dimensions == 0 (or dimensions <= 1)
  2. Guarantee that we never call a ufunc loop on 0 elements to begin with?

Since I do not think complex iteration paths can generate length 0 inner-loop calls, both seem very feasible to me, and I think I will make a PR for it. But I would also be happy with the second approach.

@mattip
Copy link
Member Author

mattip commented May 11, 2022

I think this PR is growing too large. The change to prevent ufuncs looping when PyArray_SIZE() == 0 should be merged first, then we can come back to this PR.

@mattip
Copy link
Member Author

mattip commented May 12, 2022

After a one-line check for 0-length loops in try_trivial_single_output_loop, the PR is passing. Maybe more is needed, would a valgrind check catch illegal indexing into 0-length arrays? If we still have doubts, it may be safer to allocate elemsize bytes rather than 1 byte.

@seberg
Copy link
Member

seberg commented May 12, 2022

If we fix it in the ufunc setup code, I think we should add an assert to IS_BINARY_REDUCE (which also documents the guarantee a bit). Otherwise, I don't think more is needed as such. I was trying to think of a good tests (and maybe have a quick look at accumulate).
Did you want to split out that change, anyway (Or I can do)?

@mattip
Copy link
Member Author

mattip commented May 12, 2022

I think we should add an assert to IS_BINARY_REDUCE

Is there precedence for this in the code? The macros are used like if (IS_BINARY_REDUCE()) so I am not sure how to add an assert to the macro itself

Did you want to split out that change, anyway (Or I can do)?

I originally did, but it turned out to be simpler than I thought, so I am not sure it is necessary as a separate PR.

@seberg
Copy link
Member

seberg commented May 12, 2022

Is there precedence for this in the code? The macros are used like if (IS_BINARY_REDUCE()) so I am not sure how to add an assert to the macro itself

Ah, I had not thought about it, but this should work (just tested briefly and seems good):

#define IS_BINARY_REDUCE (assert(*dimensions != 0),  \
        (args[0] == args[2])  \
        && (steps[0] == steps[2])  \
        && (steps[0] == 0))

Maybe a comment makes sense, that it uses the (expr, result) style to get to result only.


One thing here is that we still need to figure out the arr.dtype = ... code, that might get a bit messy. But lets just keep it for now, and I might add test in a followup.

EDIT: Well, unfortunatley, the assert is hit quickly even with the fix, so we probably need a few more smaller tweaks in the ufunc machinery to do this.

@mattip
Copy link
Member Author

mattip commented May 13, 2022

#21499 was merged, Is there any outstanding review for this PR (pushing on it a bit since it removes the annoying "uh oh" output when testing allocation policy changes)?

@seberg
Copy link
Member

seberg commented May 13, 2022

@mattip unfortunatley, I don't trust the arr.dtype = ... assignment code at all, this is the issue that the current version runs into, I believe:

arr = np.ones(1, dtype="i8")
arr = np.broadcast_to(arr, 10)
arr.dtype = "i1"
print(arr)  # oops

Unfortunately, I don't have a clear idea of how to fix that. Right now, I mainly see the option of making a "deep clean" here and rethinking the code almost as a whole, and I didn't try to think it through yet.
(I had briefly enterntained whether allocating a single element could help, but I don't think so.)

@mattip
Copy link
Member Author

mattip commented May 13, 2022

I pushed a fix which will allow a view of an empty array (np.prod(arr.shape) == 0), but restores the failure to set a dtype on a non-empty array when the strides do not match the elemsize. The check now requires np.prod, is that too expensive?

@@ -513,6 +516,7 @@ array_descr_set(PyArrayObject *self, PyObject *arg, void *NPY_UNUSED(ignored))
/* resize on last axis only */
int axis = PyArray_NDIM(self) - 1;
if (PyArray_DIMS(self)[axis] != 1 &&
PyArray_SIZE(self) != 0 &&
Copy link
Member

@seberg seberg May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, this should work, yeah. So the one thing is that if the stride was previously 0, in theory it should remain 0 in this case (but gets replaced with elsize). I think I am OK with ignoring the potential UB problem that may follow that for now.

If we really want to fight it, I guess we would have to add asserts into array_dealloc to flush out places that need adjustment.

@eric-wieser how worried are you actually about the UB? If not very, I am tempted to suggest accepting that this path might lead to this UB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip if you really care about the prod, you cold order it to the end, since it will be hit very rarely then?
But honestly, this is still a Python "call", so I really doubt it matters at all.

@seberg
Copy link
Member

seberg commented May 17, 2022

OK, I am going to put this in a bit optimistically, since I think Matti was hoping to get this into the next release. If we got holes in this logic, it seems OK to fix it in bug-fix releases as well.

But if Eric (or anyone else) is a bit worried about the changes, we can revert again as well. However, right now, I also think that downstream testing might be good here, and merging will get us at least a bit of testing.

@seberg seberg changed the title Reduce empty allocation MAINT: Reduce allocation size of empty (0 size) arrays to 1 byte May 17, 2022
@seberg seberg merged commit dd5ab7b into numpy:main May 17, 2022
@mattip
Copy link
Member Author

mattip commented May 17, 2022

🚀

gmarkall added a commit to gmarkall/numba that referenced this pull request Aug 12, 2022
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of
the itemsize. This causes `array_core()` to erroneously attempt to index
the array with numeric indices instead of empty slices.

Since this is a special case (empty arrays), a simple solution seems to
be for `array_core()` to return the array itself when the array is
0-sized. In order to maintain consistency of the `array_core()` function
between the simulator and hardware targets, we make this change to both
implementations.
stuartarchibald pushed a commit to stuartarchibald/numba that referenced this pull request Aug 15, 2022
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of
the itemsize. This causes `array_core()` to erroneously attempt to index
the array with numeric indices instead of empty slices.

Since this is a special case (empty arrays), a simple solution seems to
be for `array_core()` to return the array itself when the array is
0-sized. In order to maintain consistency of the `array_core()` function
between the simulator and hardware targets, we make this change to both
implementations.
stuartarchibald pushed a commit to stuartarchibald/numba that referenced this pull request Aug 16, 2022
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of
the itemsize. This causes `array_core()` to erroneously attempt to index
the array with numeric indices instead of empty slices.

Since this is a special case (empty arrays), a simple solution seems to
be for `array_core()` to return the array itself when the array is
0-sized. In order to maintain consistency of the `array_core()` function
between the simulator and hardware targets, we make this change to both
implementations.
stuartarchibald pushed a commit to stuartarchibald/numba that referenced this pull request Aug 22, 2022
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of
the itemsize. This causes `array_core()` to erroneously attempt to index
the array with numeric indices instead of empty slices.

Since this is a special case (empty arrays), a simple solution seems to
be for `array_core()` to return the array itself when the array is
0-sized. In order to maintain consistency of the `array_core()` function
between the simulator and hardware targets, we make this change to both
implementations.
gmarkall added a commit to gmarkall/distributed that referenced this pull request Sep 29, 2022
In NumPy 1.23, the strides of empty arrays are 0 instead of the item
size, due to numpy/numpy#21477 - however,
`np.empty_like` seems to create non-zero-strided arrays from a
zero-strided empty array, and copying to the host from a device array
with zero strides fails a compatibility check in Numba.

This commit works around the issue by calling `copy_to_host()` with no
arguments, allowing Numba to create an array on the host that is
compatible with the device array - the resulting implementation is
functionally equivalent and slightly simpler, so I believe this change
could remain permanant rather than requiring a revert later.
gmarkall added a commit to gmarkall/distributed that referenced this pull request Sep 29, 2022
In NumPy 1.23, the strides of empty arrays are 0 instead of the item
size, due to numpy/numpy#21477 - however,
`np.empty_like` seems to create non-zero-strided arrays from a
zero-strided empty array, and copying to the host from a device array
with zero strides fails a compatibility check in Numba.

This commit works around the issue by calling `copy_to_host()` with no
arguments, allowing Numba to create an array on the host that is
compatible with the device array - the resulting implementation is
functionally equivalent and slightly simpler, so I believe this change
could remain permanant rather than requiring a revert later.
quasiben pushed a commit to dask/distributed that referenced this pull request Sep 30, 2022
…7089)

In NumPy 1.23, the strides of empty arrays are 0 instead of the item
size, due to numpy/numpy#21477 - however,
`np.empty_like` seems to create non-zero-strided arrays from a
zero-strided empty array, and copying to the host from a device array
with zero strides fails a compatibility check in Numba.

This commit works around the issue by calling `copy_to_host()` with no
arguments, allowing Numba to create an array on the host that is
compatible with the device array - the resulting implementation is
functionally equivalent and slightly simpler, so I believe this change
could remain permanant rather than requiring a revert later.
gjoseph92 pushed a commit to gjoseph92/distributed that referenced this pull request Oct 31, 2022
…ask#7089)

In NumPy 1.23, the strides of empty arrays are 0 instead of the item
size, due to numpy/numpy#21477 - however,
`np.empty_like` seems to create non-zero-strided arrays from a
zero-strided empty array, and copying to the host from a device array
with zero strides fails a compatibility check in Numba.

This commit works around the issue by calling `copy_to_host()` with no
arguments, allowing Numba to create an array on the host that is
compatible with the device array - the resulting implementation is
functionally equivalent and slightly simpler, so I believe this change
could remain permanant rather than requiring a revert later.
@mattip mattip deleted the reduce-empty-allocation branch December 27, 2022 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

empty, zeros, ones, full allocate product of non-zero size elements, even if final array is zero-sized
4 participants