New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: arr.fill()
does not work with USE_SETITEM
scalars
#19553
Comments
This issue corresponds to the user-type fix stack-of-tasks/eigenpy#240 |
I don't understand, that seems correct and continues calculating the right offset, or is that logic wrong? Can you first check that
Lucky us :)! one less weird function to worry about... That said... the fill function does like a train-wreck that does not properly support One thing, that I must make sure though: Are you aware of my work on NEP 40-43? I know it is a bit slower than we hoped, but it is getting close and the last big steps are about to be finalized (there are quite a few small ones after that, but still). If you are interested, I would be happy to talk about it, or show you how to experiment with the new DType system. |
The memory manipulation that is being carried out is:
Our CustomDoublealignment = 8 Input: scalar: Numpy rational:alignment = 4 In both cases, the memory manipulation is working in the same manner. However, the desired location for the data pointer in our CustomDouble is |
Without inheriting from PyGenericArrType_Type, the command However, since the Generic implementation is not used in such a case, the If you could propose a way to register the new user-type as a scalar, without using np.generic, we might be able to avoid the check here numpy/numpy/core/src/multiarray/convert.c Line 381 in c3d1107
and proceed to fill the scalar properly
I would be very happy to talk about extending our work using the new features, would it work with the currently available numpy releases as well? |
Good point, the new API does this correctly, but the old API doesn't have the right hooks. (This is even inside NumPy already, but not public API...)
I don't understand how this is plausible? That memory location seems completely different, does your object store a pointer to the memory?
No, probably next release, although realistically in some "experimental" form. But getting buy-in might help make sure it happens more smoothly... |
@proyan or am I just misunderstanding here: Your dtype actually does set |
arr.fill()
does not work with USE_SETITEM
scalars
I see. I assumed as much from the addresses that I obtained. The python class is defined using boost::python here: I may be mistaken, but I am guessing that boost python uses the structure of PyArrayObject instead of an aligned C-struct. But perhaps you would know better.
Yes, we define However, when using mat.fill(), numpy/numpy/core/src/multiarray/convert.c Line 381 in c3d1107
and a few other calls, the control passes to: with the wrong source pointer address.
In [1]: mat.dtype.flags
Out[1]: 112 |
By the way, thanks a lot for the quick responses! It is quite refreshing to see real-time activity on such a giant project. |
The important part here is the flags, and they indicate So it is indeed a bug in |
Currently, there are two ways forward that I have seen Not inheriting from
|
Note that array creation will work as long as you use
No, I don't see a way right now. Aside from making sure the data is stored directly on the scalar in a C-struct manner (because, why not?). There is internal API to allow customizing this in the future, but it is not public and intentionally a bit limited. |
Thanks for the clarification and the quick response @seberg. |
I really would like to apply the diff below. But it would change two things: First (Basically, this would align It is also interesting, that the current code actually uses safe casting for the case of diff --git a/numpy/core/src/multiarray/convert.c b/numpy/core/src/multiarray/convert.c
index 29a2bb0e8..30af810d7 100644
--- a/numpy/core/src/multiarray/convert.c
+++ b/numpy/core/src/multiarray/convert.c
@@ -19,6 +19,7 @@
#include "array_assign.h"
#include "convert.h"
+#include "array_coercion.h"
int
fallocate(int fd, int mode, off_t offset, off_t len);
@@ -357,151 +358,28 @@ PyArray_ToString(PyArrayObject *self, NPY_ORDER order)
NPY_NO_EXPORT int
PyArray_FillWithScalar(PyArrayObject *arr, PyObject *obj)
{
- PyArray_Descr *dtype = NULL;
- npy_longlong value_buffer[4];
- char *value = NULL;
- int retcode = 0;
+ npy_longlong value_buffer_stack[4];
+ char *value_buffer_heap = NULL;
+ char *buf = (char *)value_buffer_stack;
- /*
- * If 'arr' is an object array, copy the object as is unless
- * 'obj' is a zero-dimensional array, in which case we copy
- * the element in that array instead.
- */
- if (PyArray_DESCR(arr)->type_num == NPY_OBJECT &&
- !(PyArray_Check(obj) &&
- PyArray_NDIM((PyArrayObject *)obj) == 0)) {
- value = (char *)&obj;
-
- dtype = PyArray_DescrFromType(NPY_OBJECT);
- if (dtype == NULL) {
- return -1;
- }
+ if ((size_t)PyArray_ITEMSIZE(arr) > sizeof(value_buffer_stack)) {
+ value_buffer_heap = PyMem_MALLOC(PyArray_ITEMSIZE(arr));
+ buf = value_buffer_heap;
}
- /* NumPy scalar */
- else if (PyArray_IsScalar(obj, Generic)) {
- dtype = PyArray_DescrFromScalar(obj);
- if (dtype == NULL) {
- return -1;
- }
- value = scalar_value(obj, dtype);
- if (value == NULL) {
- Py_DECREF(dtype);
- return -1;
- }
- }
- /* Python boolean */
- else if (PyBool_Check(obj)) {
- value = (char *)value_buffer;
- *value = (obj == Py_True);
-
- dtype = PyArray_DescrFromType(NPY_BOOL);
- if (dtype == NULL) {
- return -1;
- }
+ if (PyArray_DESCR(arr)->flags & NPY_NEEDS_INIT) {
+ memset(buf, '\0', PyArray_DESCR(arr)->elsize);
}
- /* Python integer */
- else if (PyLong_Check(obj)) {
- /* Try long long before unsigned long long */
- npy_longlong ll_v = PyLong_AsLongLong(obj);
- if (error_converting(ll_v)) {
- /* Long long failed, try unsigned long long */
- npy_ulonglong ull_v;
- PyErr_Clear();
- ull_v = PyLong_AsUnsignedLongLong(obj);
- if (ull_v == (unsigned long long)-1 && PyErr_Occurred()) {
- return -1;
- }
- value = (char *)value_buffer;
- *(npy_ulonglong *)value = ull_v;
-
- dtype = PyArray_DescrFromType(NPY_ULONGLONG);
- if (dtype == NULL) {
- return -1;
- }
- }
- else {
- /* Long long succeeded */
- value = (char *)value_buffer;
- *(npy_longlong *)value = ll_v;
- dtype = PyArray_DescrFromType(NPY_LONGLONG);
- if (dtype == NULL) {
- return -1;
- }
- }
- }
- /* Python float */
- else if (PyFloat_Check(obj)) {
- npy_double v = PyFloat_AsDouble(obj);
- if (error_converting(v)) {
- return -1;
- }
- value = (char *)value_buffer;
- *(npy_double *)value = v;
-
- dtype = PyArray_DescrFromType(NPY_DOUBLE);
- if (dtype == NULL) {
- return -1;
- }
- }
- /* Python complex */
- else if (PyComplex_Check(obj)) {
- npy_double re, im;
-
- re = PyComplex_RealAsDouble(obj);
- if (error_converting(re)) {
- return -1;
- }
- im = PyComplex_ImagAsDouble(obj);
- if (error_converting(im)) {
- return -1;
- }
- value = (char *)value_buffer;
- ((npy_double *)value)[0] = re;
- ((npy_double *)value)[1] = im;
-
- dtype = PyArray_DescrFromType(NPY_CDOUBLE);
- if (dtype == NULL) {
- return -1;
- }
- }
-
- /* Use the value pointer we got if possible */
- if (value != NULL) {
- /* TODO: switch to SAME_KIND casting */
- retcode = PyArray_AssignRawScalar(arr, dtype, value,
- NULL, NPY_UNSAFE_CASTING);
- Py_DECREF(dtype);
- return retcode;
+ /* NOTE: PyArray_Pack effectively uses unsafe casting. */
+ if (PyArray_Pack(PyArray_DESCR(arr), buf, obj) < 0) {
+ PyMem_FREE(value_buffer_heap);
+ return -1;
}
- /* Otherwise convert to an array to do the assignment */
- else {
- PyArrayObject *src_arr;
-
- /**
- * The dtype of the destination is used when converting
- * from the pyobject, so that for example a tuple gets
- * recognized as a struct scalar of the required type.
- */
- Py_INCREF(PyArray_DTYPE(arr));
- src_arr = (PyArrayObject *)PyArray_FromAny(obj,
- PyArray_DTYPE(arr), 0, 0, 0, NULL);
- if (src_arr == NULL) {
- return -1;
- }
-
- if (PyArray_NDIM(src_arr) != 0) {
- PyErr_SetString(PyExc_ValueError,
- "Input object to FillWithScalar is not a scalar");
- Py_DECREF(src_arr);
- return -1;
- }
-
- retcode = PyArray_CopyInto(arr, src_arr);
- Py_DECREF(src_arr);
- return retcode;
- }
+ int res = PyArray_AssignRawScalar(
+ arr, PyArray_DESCR(arr), buf, NULL, NPY_UNSAFE_CASTING);
+ PyMem_FREE(value_buffer_heap);
+ return res;
}
/*
diff --git a/numpy/core/tests/test_multiarray.py b/numpy/core/tests/test_multiarray.py
index 5f0a725d2..fc2358f33 100644
--- a/numpy/core/tests/test_multiarray.py
+++ b/numpy/core/tests/test_multiarray.py
@@ -67,7 +67,7 @@ def _aligned_zeros(shape, dtype=float, order="C", align=None):
# data pointer --- so we use and allocate size+1
buf = buf[offset:offset+size+1][:-1]
data = np.ndarray(shape, dtype, buf, order=order)
- data.fill(0)
+ data.fill(np.uint8(0))
return data |
Thanks for the fix @seberg. I agree |
ref @jcarpent |
TLDR:
When creating a user-defined data-type using boost-python, np.fill() uses a memory manipulation scheme which doesn't correspond to the PyObject ABI created by boost-python. On the other hand, direct assignment using mat[:] accesses the data using PyArray_Data, and returns the correct pointer.
As a result, there is garbage value assignment in fill(), and sane value assignment in direct access
Detailed:
Hello NumPy developers,
First of all, I want to thank you all for your hardwork.
We are a team of robotics engineers/researchers, and one of our packages includes extension of the Eigen3 library to python using numpy.ndarray as a container. (see eigenpy library)
One of the recent features proposed in this library is the definition of user-types as numpy scalars, (https://github.com/jcarpent/eigenpy/blob/devel/include/eigenpy/user-type.hpp#L235). The unit-test for this feature, (https://github.com/jcarpent/eigenpy/blob/devel/unittest/python/test_user_type.py), does not correctly perform some allocation in the
fill()
instruction, and thus the test fails atOn the other hand,
succeeds without any issue.
I have tried to compare the behaviour of the
mat.fill()
command withmat[:]
and the numpy user-defined dtypenumpy.core._rational_tests.rational
ingdb
, to understand better how the wrong address is being propagated in the following sample script:The output of this script is:
In all three tests, source variables (
a
andx
) are being copied to destination variables (mat
andy
)In the sample run that I did,
First Test
The variable
a
is copied to the destination pointer through thecopyswapn
function defined here, with the argumentsThe backtrace shows that the correct data pointer inside
a
is accessed through PyArray_Data here:numpy/numpy/core/src/multiarray/array_assign_array.c
Line 272 in 3409bd3
This test thus passes.
Second Test
mat.fill however, is not able to correctly acquire the pointer for the location of the source data. The arguments given to
copyswapn
function are:which don't correspond to the correct location of the source data.
The backtrace shows that this data pointer is created here:
numpy/numpy/core/src/multiarray/scalarapi.c
Line 171 in 3409bd3
which assumes that the data begins immediately after PyObject_HEAD. This is obviously not the case.
This command fails.
Third test:
The numpy example data-type
rational
works, since it is following the ABI that is assumed by np.fill(). In this case, the data pointer created by both PyArray_Data and memory manipulation matches.Surprisingly,
fillwithscalar
functional, which is defined in both our custom user type, and numpy's rational type, is never used.Could you please explain how boost-python classes could be used as numpy user-defined data types, and how to fix the support of ndarray.fill?
Thanks,Rohan
The text was updated successfully, but these errors were encountered: