Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Distinguish exact vs. equivalent dtype for C type aliases. #21995

Merged
merged 11 commits into from Sep 7, 2022
21 changes: 21 additions & 0 deletions doc/release/upcoming_changes/21995.compatibility.rst
@@ -0,0 +1,21 @@
Returned arrays respect uniqueness of dtype kwarg objects
---------------------------------------------------------
When the ``dtype`` keyword argument is used with :py:func:`np.array()`
or :py:func:`asarray()`, the dtype of the returned array now
always exactly matches the dtype provided by the caller.

In some cases this change means that a *view* rather than the
input array is returned.
The following is an example for this on 64bit Linux where ``long``
and ``longlong`` are the same precision::

>>> arr = np.array([1, 2, 3], dtype="long")
>>> new_dtype = np.dtype("longlong")
>>> new = np.asarray(arr, dtype=new_dtype)
>>> new.dtype is dtype
rgommers marked this conversation as resolved.
Show resolved Hide resolved
True
>>> new is arr
False

Before the change, the ``dtype`` did not match because ``new is arr``
was true.
64 changes: 64 additions & 0 deletions numpy/array_api/tests/test_asarray.py
@@ -0,0 +1,64 @@
import itertools

import numpy as np


def test_dtype_identity():
"""Confirm the intended behavior for ``asarray`` results.

The result of ``asarray()`` should have the dtype provided through the
keyword argument, when used. This forces unique array handles to be
produced for unique np.dtype objects, but (for equivalent dtypes), the
underlying data (the base object) is shared with the original array object.

Ref https://github.com/numpy/numpy/issues/1468
"""
int_array = np.array([1, 2, 3], dtype='i')
assert np.asarray(int_array) is int_array

# The character code resolves to the singleton dtype object provided
# by the numpy package.
assert np.asarray(int_array, dtype='i') is int_array

# Derive a dtype from n.dtype('i'), but add a metadata object to force
# the dtype to be distinct.
unequal_type = np.dtype('i', metadata={'spam': True})
annotated_int_array = np.asarray(int_array, dtype=unequal_type)
assert annotated_int_array is not int_array
assert annotated_int_array.base is int_array
# Create an equivalent descriptor with a new and distinct dtype instance.
equivalent_requirement = np.dtype('i', metadata={'spam': True})
annotated_int_array_alt = np.asarray(annotated_int_array,
dtype=equivalent_requirement)
assert unequal_type == equivalent_requirement
assert unequal_type is not equivalent_requirement
assert annotated_int_array_alt is not annotated_int_array
assert annotated_int_array_alt.dtype is equivalent_requirement

# Check the same logic for a pair of C types whose equivalence may vary
# between computing environments.
# Find an equivalent pair.
integer_type_codes = ('i', 'l', 'q')
integer_dtypes = [np.dtype(code) for code in integer_type_codes]
typeA = None
typeB = None
for typeA, typeB in itertools.permutations(integer_dtypes, r=2):
if typeA == typeB:
assert typeA is not typeB
break
assert isinstance(typeA, np.dtype) and isinstance(typeB, np.dtype)

# These ``asarray()`` calls may produce a new view or a copy,
# but never the same object.
long_int_array = np.asarray(int_array, dtype='l')
long_long_int_array = np.asarray(int_array, dtype='q')
assert long_int_array is not int_array
assert long_long_int_array is not int_array
assert np.asarray(long_int_array, dtype='q') is not long_int_array
array_a = np.asarray(int_array, dtype=typeA)
assert typeA == typeB
assert typeA is not typeB
assert array_a.dtype is typeA
assert array_a is not np.asarray(array_a, dtype=typeB)
assert np.asarray(array_a, dtype=typeB).dtype is typeB
assert array_a is np.asarray(array_a, dtype=typeB).base
23 changes: 21 additions & 2 deletions numpy/core/src/multiarray/multiarraymodule.c
Expand Up @@ -1627,12 +1627,31 @@ _array_fromobject_generic(
goto finish;
}
}
/* One more chance */
/* One more chance for faster exit if user specified the dtype. */
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
oldtype = PyArray_DESCR(oparr);
if (PyArray_EquivTypes(oldtype, type)) {
if (copy != NPY_COPY_ALWAYS && STRIDING_OK(oparr, order)) {
Py_INCREF(op);
ret = oparr;
if (oldtype == type) {
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
ret = oparr;
}
else {
/* Create a new PyArrayObject from the caller's
* PyArray_Descr. Use the reference `op` as the base
* object. */
Py_INCREF(type);
ret = (PyArrayObject *)PyArray_NewFromDescrAndBase(
Py_TYPE(op),
type,
PyArray_NDIM(oparr),
PyArray_DIMS(oparr),
PyArray_STRIDES(oparr),
PyArray_DATA(oparr),
PyArray_FLAGS(oparr),
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
op,
op
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, either I just didn't think of it, or expected that this steals a reference to op for the base. We are missing a DECREF, I will make a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I originally tried PyArray_NewFromDescr with op for *data and nullptr for *obj, and PyArray_SetBaseObject or something that steals the reference for the base. Anyway, I guess we were sloppy when switching to PyArray_NewFromDescrAndBase. Thanks for the quick catch!

}
goto finish;
}
else {
Expand Down
2 changes: 1 addition & 1 deletion numpy/ma/tests/test_core.py
Expand Up @@ -5387,7 +5387,7 @@ def test_ufunc_with_out_varied():


def test_astype_mask_ordering():
descr = [('v', int, 3), ('x', [('y', float)])]
descr = np.dtype([('v', int, 3), ('x', [('y', float)])])
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
x = array([
[([1, 2, 3], (1.0,)), ([1, 2, 3], (2.0,))],
[([1, 2, 3], (3.0,)), ([1, 2, 3], (4.0,))]], dtype=descr)
Expand Down