From ac624d012cc0c3f90da4593e7bb8d9d335fa9696 Mon Sep 17 00:00:00 2001 From: Sebastian Berg Date: Thu, 27 Jan 2022 13:20:54 -0600 Subject: [PATCH 1/3] MAINT: Simplify element setting and use it for filling This slightly modifies the behaviour of `arr.fill()` to be `arr.fill(scalar)`, i.e. match `arr1d[0] = scalar`, rather than `arr.fill(np.asarray(scalar))`, which subtely different! (Note that object was already special cased to have the scalar logic.) Otherwise, `PyArray_Pack` is now the actual, full featured, "scalar" assignment logic. It is a bit strange due to that quantity/masked array issue, but there is nothing to be done about it. The simplifications in `PyArray_AssignFromCache` should not cause any change in practice, because non 0-D arrays would have been rejected earlier on in that path. (Basically, it does not need the full `PyArray_Pack` logic, but that is fine, I intially split the two, but consolidated them again.) --- .../upcoming_changes/20924.compatibility.rst | 23 +++ numpy/core/src/multiarray/array_coercion.c | 112 ++++++++---- numpy/core/src/multiarray/convert.c | 166 +++--------------- numpy/core/src/multiarray/ctors.c | 65 ++----- numpy/core/src/multiarray/dtypemeta.c | 1 + numpy/core/src/multiarray/iterators.c | 3 +- numpy/core/tests/test_array_coercion.py | 4 +- numpy/core/tests/test_multiarray.py | 2 +- 8 files changed, 146 insertions(+), 230 deletions(-) create mode 100644 doc/release/upcoming_changes/20924.compatibility.rst diff --git a/doc/release/upcoming_changes/20924.compatibility.rst b/doc/release/upcoming_changes/20924.compatibility.rst new file mode 100644 index 000000000000..68b7e48927b4 --- /dev/null +++ b/doc/release/upcoming_changes/20924.compatibility.rst @@ -0,0 +1,23 @@ +``array.fill(scalar)`` may behave slightly different +---------------------------------------------------- +`~numpy.ndarray.fill` may in some cases behave slightly different +now due to the fact that the logic is aligned with item assignment:: + + arr = np.array([1]) # with any dtype/value + arr.fill(scalar) + # is now identical to: + arr[0] = scalar + +When previously, effectively ``arr.fill(np.asarray(scalar))`` was used +in many cases. +The typical possible change is that errors will be given for example for +``np.array([0]).fill(1e300)`` or ``np.array([0]).fill(np.nan)``. +Previously, ``arr.fill()`` would unpack 0-D object arrays if the input +array was also of object dtype, the 0-D array will now be stored as is:: + + >>> arr = np.array([0], dtype=object) + >>> arr.fill(np.array(None, dtype=object)) + >>> arr + array([array(None, dtype=object)], dtype=object) + +Rather than ``array([None], dtype=object)``. diff --git a/numpy/core/src/multiarray/array_coercion.c b/numpy/core/src/multiarray/array_coercion.c index 1559f348539e..0886d2d0b403 100644 --- a/numpy/core/src/multiarray/array_coercion.c +++ b/numpy/core/src/multiarray/array_coercion.c @@ -378,6 +378,34 @@ find_scalar_descriptor( } +/* + * Helper function for casting a raw value from one descriptor to another. + * This helper uses the normal casting machinery, but e.g. does not care about + * checking cast safety. + */ +static int +cast_raw_scalar_item( + PyArray_Descr *from_descr, char *from_item, + PyArray_Descr *to_descr, char *to_item) +{ + int needs_api = 0; + NPY_cast_info cast_info; + if (PyArray_GetDTypeTransferFunction( + 0, 0, 0, from_descr, to_descr, 0, &cast_info, + &needs_api) == NPY_FAIL) { + return -1; + } + char *args[2] = {from_item, to_item}; + const npy_intp strides[2] = {0, 0}; + const npy_intp length = 1; + int res = cast_info.func(&cast_info.context, + args, &length, strides, cast_info.auxdata); + + NPY_cast_info_xfree(&cast_info); + return res; +} + + /** * Assign a single element in an array from a python value. * @@ -388,26 +416,35 @@ find_scalar_descriptor( * This function handles the cast, which is for example hit when assigning * a float128 to complex128. * - * At this time, this function does not support arrays (historically we - * mainly supported arrays through `__float__()`, etc.). Such support should - * possibly be added (although when called from `PyArray_AssignFromCache` - * the input cannot be an array). - * Note that this is also problematic for some array-likes, such as - * `astropy.units.Quantity` and `np.ma.masked`. These are used to us calling - * `__float__`/`__int__` for 0-D instances in many cases. - * Eventually, we may want to define this as wrong: They must use DTypes - * instead of (only) subclasses. Until then, here as well as in - * `PyArray_AssignFromCache` (which already does this), we need to special - * case 0-D array-likes to behave like arbitrary (unknown!) Python objects. + * TODO: This function probably needs to be passed an "owner" for the sake of + * future HPy (non CPython) support + * + * NOTE: We do support 0-D exact NumPy arrays correctly via casting here. + * There be dragons, because we must NOT support generic array-likes. + * The problem is that some (e.g. astropy's Quantity and our masked + * arrays) have divergent behaviour for `__array__` as opposed to + * `__float__`. And they rely on that. + * That is arguably bad as it limits the things that work seamlessly + * because `__float__`, etc. cannot even begin to cover all of casting. + * However, we have no choice. We simply CANNOT support array-likes + * here without finding a solution for this first. + * And the only plausible one I see currently, is expanding protocols + * in some form, either to indicate that we want a scalar or to indicate + * that we want the unsafe version that `__array__` currently gives + * for both objects. + * + * If we ever figure out how to expand this to other array-likes, care + * may need to be taken. `PyArray_FromAny`/`PyArray_AssignFromCache` + * uses this function but know if the input is an array, array-like, + * or scalar. Relaxing things here should be OK, but looks a bit + * like possible recursion, so it may make sense to make a "scalars only" + * version of this function. * * @param descr * @param item * @param value * @return 0 on success -1 on failure. */ -/* - * TODO: This function should possibly be public API. - */ NPY_NO_EXPORT int PyArray_Pack(PyArray_Descr *descr, char *item, PyObject *value) { @@ -433,6 +470,29 @@ PyArray_Pack(PyArray_Descr *descr, char *item, PyObject *value) if (DType == NULL) { return -1; } + if (DType == (PyArray_DTypeMeta *)Py_None && PyArray_CheckExact(value) + && PyArray_NDIM((PyArrayObject *)value) == 0) { + /* + * WARNING: Do NOT relax the above `PyArray_CheckExact`, unless you + * read the function doc NOTE carefully and understood it. + * + * NOTE: The ndim == 0 check should probably be an error, but + * unfortunately. `arr.__float__()` works for 1 element arrays + * so in some contexts we need to let it handled like a scalar. + * (If we manage to deprecate the above, we can do that.) + */ + Py_DECREF(DType); + + PyArrayObject *arr = (PyArrayObject *)value; + if (PyArray_DESCR(arr) == descr && !PyDataType_REFCHK(descr)) { + /* light-weight fast-path for when the descrs obviously matches */ + memcpy(item, PyArray_BYTES(arr), descr->elsize); + return 0; /* success (it was an array-like) */ + } + return cast_raw_scalar_item( + PyArray_DESCR(arr), PyArray_BYTES(arr), descr, item); + + } if (DType == NPY_DTYPE(descr) || DType == (PyArray_DTypeMeta *)Py_None) { /* We can set the element directly (or at least will try to) */ Py_XDECREF(DType); @@ -461,30 +521,8 @@ PyArray_Pack(PyArray_Descr *descr, char *item, PyObject *value) Py_DECREF(tmp_descr); return -1; } - if (PyDataType_REFCHK(tmp_descr)) { - /* We could probably use move-references above */ - PyArray_Item_INCREF(data, tmp_descr); - } - - int res = 0; - int needs_api = 0; - NPY_cast_info cast_info; - if (PyArray_GetDTypeTransferFunction( - 0, 0, 0, tmp_descr, descr, 0, &cast_info, - &needs_api) == NPY_FAIL) { - res = -1; - goto finish; - } - char *args[2] = {data, item}; - const npy_intp strides[2] = {0, 0}; - const npy_intp length = 1; - if (cast_info.func(&cast_info.context, - args, &length, strides, cast_info.auxdata) < 0) { - res = -1; - } - NPY_cast_info_xfree(&cast_info); + int res = cast_raw_scalar_item(tmp_descr, data, descr, item); - finish: if (PyDataType_REFCHK(tmp_descr)) { /* We could probably use move-references above */ PyArray_Item_XDECREF(data, tmp_descr); diff --git a/numpy/core/src/multiarray/convert.c b/numpy/core/src/multiarray/convert.c index 630253e38b64..2aed0bbb4986 100644 --- a/numpy/core/src/multiarray/convert.c +++ b/numpy/core/src/multiarray/convert.c @@ -20,6 +20,7 @@ #include "array_assign.h" #include "convert.h" +#include "array_coercion.h" int fallocate(int fd, int mode, off_t offset, off_t len); @@ -358,151 +359,42 @@ PyArray_ToString(PyArrayObject *self, NPY_ORDER order) NPY_NO_EXPORT int PyArray_FillWithScalar(PyArrayObject *arr, PyObject *obj) { - PyArray_Descr *dtype = NULL; - npy_longlong value_buffer[4]; - char *value = NULL; - int retcode = 0; - /* - * If 'arr' is an object array, copy the object as is unless - * 'obj' is a zero-dimensional array, in which case we copy - * the element in that array instead. + * If we knew that the output array has at least one element, we would + * not actually need a helping buffer, we always null it, just in case. + * + * (The longlong here should help with alignment.) */ - if (PyArray_DESCR(arr)->type_num == NPY_OBJECT && - !(PyArray_Check(obj) && - PyArray_NDIM((PyArrayObject *)obj) == 0)) { - value = (char *)&obj; - - dtype = PyArray_DescrFromType(NPY_OBJECT); - if (dtype == NULL) { - return -1; - } - } - /* NumPy scalar */ - else if (PyArray_IsScalar(obj, Generic)) { - dtype = PyArray_DescrFromScalar(obj); - if (dtype == NULL) { - return -1; - } - value = scalar_value(obj, dtype); - if (value == NULL) { - Py_DECREF(dtype); - return -1; - } - } - /* Python boolean */ - else if (PyBool_Check(obj)) { - value = (char *)value_buffer; - *value = (obj == Py_True); - - dtype = PyArray_DescrFromType(NPY_BOOL); - if (dtype == NULL) { - return -1; - } - } - /* Python integer */ - else if (PyLong_Check(obj)) { - /* Try long long before unsigned long long */ - npy_longlong ll_v = PyLong_AsLongLong(obj); - if (error_converting(ll_v)) { - /* Long long failed, try unsigned long long */ - npy_ulonglong ull_v; - PyErr_Clear(); - ull_v = PyLong_AsUnsignedLongLong(obj); - if (ull_v == (unsigned long long)-1 && PyErr_Occurred()) { - return -1; - } - value = (char *)value_buffer; - *(npy_ulonglong *)value = ull_v; - - dtype = PyArray_DescrFromType(NPY_ULONGLONG); - if (dtype == NULL) { - return -1; - } - } - else { - /* Long long succeeded */ - value = (char *)value_buffer; - *(npy_longlong *)value = ll_v; - - dtype = PyArray_DescrFromType(NPY_LONGLONG); - if (dtype == NULL) { - return -1; - } - } - } - /* Python float */ - else if (PyFloat_Check(obj)) { - npy_double v = PyFloat_AsDouble(obj); - if (error_converting(v)) { - return -1; - } - value = (char *)value_buffer; - *(npy_double *)value = v; - - dtype = PyArray_DescrFromType(NPY_DOUBLE); - if (dtype == NULL) { + npy_longlong value_buffer_stack[4] = {0}; + char *value_buffer_heap = NULL; + char *value = (char *)value_buffer_stack; + PyArray_Descr *descr = PyArray_DESCR(arr); + + if (descr->elsize > sizeof(value_buffer_stack)) { + /* We need a large temporary buffer... */ + value_buffer_heap = PyObject_Calloc(1, descr->elsize); + if (value_buffer_heap == NULL) { + PyErr_NoMemory(); return -1; } + value = value_buffer_heap; } - /* Python complex */ - else if (PyComplex_Check(obj)) { - npy_double re, im; - - re = PyComplex_RealAsDouble(obj); - if (error_converting(re)) { - return -1; - } - im = PyComplex_ImagAsDouble(obj); - if (error_converting(im)) { - return -1; - } - value = (char *)value_buffer; - ((npy_double *)value)[0] = re; - ((npy_double *)value)[1] = im; - - dtype = PyArray_DescrFromType(NPY_CDOUBLE); - if (dtype == NULL) { - return -1; - } - } - - /* Use the value pointer we got if possible */ - if (value != NULL) { - /* TODO: switch to SAME_KIND casting */ - retcode = PyArray_AssignRawScalar(arr, dtype, value, - NULL, NPY_UNSAFE_CASTING); - Py_DECREF(dtype); - return retcode; + if (PyArray_Pack(descr, value, obj) < 0) { + PyMem_FREE(value_buffer_heap); + return -1; } - /* Otherwise convert to an array to do the assignment */ - else { - PyArrayObject *src_arr; - /** - * The dtype of the destination is used when converting - * from the pyobject, so that for example a tuple gets - * recognized as a struct scalar of the required type. - */ - Py_INCREF(PyArray_DTYPE(arr)); - src_arr = (PyArrayObject *)PyArray_FromAny(obj, - PyArray_DTYPE(arr), 0, 0, 0, NULL); - if (src_arr == NULL) { - return -1; - } - - if (PyArray_NDIM(src_arr) != 0) { - PyErr_SetString(PyExc_ValueError, - "Input object to FillWithScalar is not a scalar"); - Py_DECREF(src_arr); - return -1; - } - - retcode = PyArray_CopyInto(arr, src_arr); + /* + * There is no cast anymore, the above already coerced using scalar + * coercion rules + */ + int retcode = raw_array_assign_scalar( + PyArray_NDIM(arr), PyArray_DIMS(arr), descr, + PyArray_BYTES(arr), PyArray_STRIDES(arr), + descr, value); - Py_DECREF(src_arr); - return retcode; - } + PyMem_FREE(value_buffer_heap); + return retcode; } /* diff --git a/numpy/core/src/multiarray/ctors.c b/numpy/core/src/multiarray/ctors.c index c780f4b2bb12..95a3bd84d82b 100644 --- a/numpy/core/src/multiarray/ctors.c +++ b/numpy/core/src/multiarray/ctors.c @@ -465,55 +465,12 @@ PyArray_AssignFromCache_Recursive( PyArrayObject *self, const int ndim, coercion_cache_obj **cache) { /* Consume first cache element by extracting information and freeing it */ - PyObject *original_obj = (*cache)->converted_obj; PyObject *obj = (*cache)->arr_or_sequence; Py_INCREF(obj); npy_bool sequence = (*cache)->sequence; int depth = (*cache)->depth; *cache = npy_unlink_coercion_cache(*cache); - /* - * The maximum depth is special (specifically for objects), but usually - * unrolled in the sequence branch below. - */ - if (NPY_UNLIKELY(depth == ndim)) { - /* - * We have reached the maximum depth. We should simply assign to the - * element in principle. There is one exception. If this is a 0-D - * array being stored into a 0-D array (but we do not reach here then). - */ - if (PyArray_ISOBJECT(self)) { - assert(ndim != 0); /* guaranteed by PyArray_AssignFromCache */ - assert(PyArray_NDIM(self) == 0); - Py_DECREF(obj); - return PyArray_Pack(PyArray_DESCR(self), PyArray_BYTES(self), - original_obj); - } - if (sequence) { - /* - * Sanity check which may be removed, the error is raised already - * in `PyArray_DiscoverDTypeAndShape`. - */ - assert(0); - PyErr_SetString(PyExc_RuntimeError, - "setting an array element with a sequence"); - goto fail; - } - else if (original_obj != obj || !PyArray_CheckExact(obj)) { - /* - * If the leave node is an array-like, but not a numpy array, - * we pretend it is an arbitrary scalar. This means that in - * most cases (where the dtype is int or float), we will end - * up using float(array-like), or int(array-like). That does - * not support general casting, but helps Quantity and masked - * arrays, because it allows them to raise an error when - * `__float__()` or `__int__()` is called. - */ - Py_DECREF(obj); - return PyArray_SETITEM(self, PyArray_BYTES(self), original_obj); - } - } - /* The element is either a sequence, or an array */ if (!sequence) { /* Straight forward array assignment */ @@ -535,20 +492,24 @@ PyArray_AssignFromCache_Recursive( for (npy_intp i = 0; i < length; i++) { PyObject *value = PySequence_Fast_GET_ITEM(obj, i); - if (*cache == NULL || (*cache)->converted_obj != value || - (*cache)->depth != depth + 1) { - if (ndim != depth + 1) { - PyErr_SetString(PyExc_RuntimeError, - "Inconsistent object during array creation? " - "Content of sequences changed (now too shallow)."); - goto fail; - } - /* Straight forward assignment of elements */ + if (ndim == depth + 1) { + /* + * Straight forward assignment of elements. Note that it is + * possible for such an element to be a 0-D array or array-like. + * `PyArray_Pack` supports arrays as well as we want: We + * support exact NumPy arrays, but at this point ignore others. + * (Please see the `PyArray_Pack` function comment if this + * rightly confuses you.) + */ char *item; item = (PyArray_BYTES(self) + i * PyArray_STRIDES(self)[0]); if (PyArray_Pack(PyArray_DESCR(self), item, value) < 0) { goto fail; } + /* If this was an array(-like) we still need to unlike int: */ + if (*cache != NULL && (*cache)->converted_obj == value) { + *cache = npy_unlink_coercion_cache(*cache); + } } else { PyArrayObject *view; diff --git a/numpy/core/src/multiarray/dtypemeta.c b/numpy/core/src/multiarray/dtypemeta.c index 577478d2a1a0..cc99a3ecafc8 100644 --- a/numpy/core/src/multiarray/dtypemeta.c +++ b/numpy/core/src/multiarray/dtypemeta.c @@ -613,6 +613,7 @@ string_unicode_common_dtype(PyArray_DTypeMeta *cls, PyArray_DTypeMeta *other) return cls; } + static PyArray_DTypeMeta * datetime_common_dtype(PyArray_DTypeMeta *cls, PyArray_DTypeMeta *other) { diff --git a/numpy/core/src/multiarray/iterators.c b/numpy/core/src/multiarray/iterators.c index f959162fd015..95aa11d2ded7 100644 --- a/numpy/core/src/multiarray/iterators.c +++ b/numpy/core/src/multiarray/iterators.c @@ -827,7 +827,8 @@ iter_ass_subscript(PyArrayIterObject *self, PyObject *ind, PyObject *val) if (PyBool_Check(ind)) { retval = 0; if (PyObject_IsTrue(ind)) { - retval = PyArray_Pack(PyArray_DESCR(self->ao), self->dataptr, val); + retval = PyArray_Pack( + PyArray_DESCR(self->ao), self->dataptr, val); } goto finish; } diff --git a/numpy/core/tests/test_array_coercion.py b/numpy/core/tests/test_array_coercion.py index e858cd8b6385..b8b8ef187668 100644 --- a/numpy/core/tests/test_array_coercion.py +++ b/numpy/core/tests/test_array_coercion.py @@ -614,8 +614,8 @@ def __len__(self): obj.append([2, 3]) obj.append(mylist([1, 2])) - with pytest.raises(RuntimeError): - np.array(obj) + # Does not crash: + np.array(obj) def test_replace_0d_array(self): # List to coerce, `mylist` will mutate the first element diff --git a/numpy/core/tests/test_multiarray.py b/numpy/core/tests/test_multiarray.py index f4454130d723..7e86db8b64d2 100644 --- a/numpy/core/tests/test_multiarray.py +++ b/numpy/core/tests/test_multiarray.py @@ -68,8 +68,8 @@ def _aligned_zeros(shape, dtype=float, order="C", align=None): # Note: slices producing 0-size arrays do not necessarily change # data pointer --- so we use and allocate size+1 buf = buf[offset:offset+size+1][:-1] + buf.fill(0) data = np.ndarray(shape, dtype, buf, order=order) - data.fill(0) return data From 88b9fc328741dd24f285fa25ae8c7eaa57bf2e90 Mon Sep 17 00:00:00 2001 From: Sebastian Berg Date: Sun, 12 Jun 2022 13:50:24 -0700 Subject: [PATCH 2/3] DOC: Shorten release note about `arr.fill()` change Co-authored-by: Matti Picus --- .../upcoming_changes/20924.compatibility.rst | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/doc/release/upcoming_changes/20924.compatibility.rst b/doc/release/upcoming_changes/20924.compatibility.rst index 68b7e48927b4..0446b714fe94 100644 --- a/doc/release/upcoming_changes/20924.compatibility.rst +++ b/doc/release/upcoming_changes/20924.compatibility.rst @@ -8,16 +8,4 @@ now due to the fact that the logic is aligned with item assignment:: # is now identical to: arr[0] = scalar -When previously, effectively ``arr.fill(np.asarray(scalar))`` was used -in many cases. -The typical possible change is that errors will be given for example for -``np.array([0]).fill(1e300)`` or ``np.array([0]).fill(np.nan)``. -Previously, ``arr.fill()`` would unpack 0-D object arrays if the input -array was also of object dtype, the 0-D array will now be stored as is:: - - >>> arr = np.array([0], dtype=object) - >>> arr.fill(np.array(None, dtype=object)) - >>> arr - array([array(None, dtype=object)], dtype=object) - -Rather than ``array([None], dtype=object)``. +Previously casting may have produced slightly different answers when using values that could not be represented in the target `dtype` or when using `object`s. From 20b75ea429686fa8c0e08f38c5671e4cba5771ff Mon Sep 17 00:00:00 2001 From: Sebastian Berg Date: Sun, 12 Jun 2022 13:58:49 -0700 Subject: [PATCH 3/3] DOC: Add example assigning an array using fill and tweak release note --- .../upcoming_changes/20924.compatibility.rst | 4 +++- numpy/core/_add_newdocs.py | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/doc/release/upcoming_changes/20924.compatibility.rst b/doc/release/upcoming_changes/20924.compatibility.rst index 0446b714fe94..02f3e9fdbace 100644 --- a/doc/release/upcoming_changes/20924.compatibility.rst +++ b/doc/release/upcoming_changes/20924.compatibility.rst @@ -8,4 +8,6 @@ now due to the fact that the logic is aligned with item assignment:: # is now identical to: arr[0] = scalar -Previously casting may have produced slightly different answers when using values that could not be represented in the target `dtype` or when using `object`s. +Previously casting may have produced slightly different answers when using +values that could not be represented in the target ``dtype`` or when the +target had ``object`` dtype. diff --git a/numpy/core/_add_newdocs.py b/numpy/core/_add_newdocs.py index fb9c30d93080..3e8df6d46825 100644 --- a/numpy/core/_add_newdocs.py +++ b/numpy/core/_add_newdocs.py @@ -3437,6 +3437,24 @@ >>> a array([1., 1.]) + Fill expects a scalar value and always behaves the same as assigning + to a single array element. The following is a rare example where this + distinction is important: + + >>> a = np.array([None, None], dtype=object) + >>> a[0] = np.array(3) + >>> a + array([array(3), None], dtype=object) + >>> a.fill(np.array(3)) + >>> a + array([array(3), array(3)], dtype=object) + + Where other forms of assignments will unpack the array being assigned: + + >>> a[...] = np.array(3) + >>> a + array([3, 3], dtype=object) + """))