Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Distinguish exact vs. equivalent dtype for C type aliases. #21995

Merged
merged 11 commits into from Sep 7, 2022
46 changes: 46 additions & 0 deletions doc/release/upcoming_changes/21995.compatibility.rst
@@ -0,0 +1,46 @@
Returned arrays respect uniqueness of dtype kwarg objects
---------------------------------------------------------
When ``dtype`` keyword argument is used with :py:func:`np.array()`
or :py:func:`asarray()`, the dtype of the returned array has
the same dtype *instance* as provided by the caller.

If the provided dtype is compatible, but not identically the same
:py:class:`dtype` object, a new array handle is always created with
a reference to the user-provided dtype instance.
If the data type is compatible, and copying is not required, the new
`ndarray` uses the original array as its
`base <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.base.html>`__.

Before this change, for two equivalent but non-identical dtypes,

assert isinstance(typeA, np.dtype) and isinstance(typeB, np.dtype)
assert typeA == typeB
assert typeA is not typeB
if my_array.dtype is typeA:
assert my_array is np.asarray(my_array, dtype=typeB)
assert np.asarray(my_array, dtype=typeB).dtype is not typeB

This change allows programs to be able to reliably get the exact dtype
representation they request, regardless of possibly aliased types on the
calling platform.

However, identity semantics for array results and their
dtype members may require minor updates to calling code.

After this change, on a system where C ``int`` and C ``long`` are the same
precision, ``np.dtype('i') == np.dtype('l')``,
but ``np.dtype('i') is not np.dtype('l')``.

assert int_array.dtype is np.dtype('i')
long_int_array = np.asarray(int_array, dtype='l')
assert long_int_array is not int_array
if np.dtype('i') == np.dtype('l'):
assert int_array is long_int_array.base

New array views are created with each call to `asarray` with non-identical
dtype kwarg, but the underlying data is the same.

assert int_array.dtype is np.dtype('i')
long_int_array = np.asarray(int_array, dtype='l')
assert long_int_array is not np.asarray(int_array, dtype='l')
assert long_int_array.base is np.asarray(int_array, dtype='l').base
rgommers marked this conversation as resolved.
Show resolved Hide resolved
64 changes: 64 additions & 0 deletions numpy/array_api/tests/test_asarray.py
@@ -0,0 +1,64 @@
import itertools

import numpy as np


def test_dtype_identity():
"""Confirm the intended behavior for ``asarray`` results.

The result of ``asarray()`` should have the dtype provided through the
keyword argument, when used. This forces unique array handles to be
produced for unique np.dtype objects, but (for equivalent dtypes), the
underlying data (the base object) is shared with the original array object.

Ref https://github.com/numpy/numpy/issues/1468
"""
int_array = np.array([1, 2, 3], dtype='i')
assert np.asarray(int_array) is int_array

# The character code resolves to the singleton dtype object provided
# by the numpy package.
assert np.asarray(int_array, dtype='i') is int_array

# Derive a dtype from n.dtype('i'), but add a metadata object to force
# the dtype to be distinct.
unequal_type = np.dtype('i', metadata={'spam': True})
annotated_int_array = np.asarray(int_array, dtype=unequal_type)
assert annotated_int_array is not int_array
assert annotated_int_array.base is int_array
# Create an equivalent descriptor with a new and distinct dtype instance.
equivalent_requirement = np.dtype('i', metadata={'spam': True})
annotated_int_array_alt = np.asarray(annotated_int_array,
dtype=equivalent_requirement)
assert unequal_type == equivalent_requirement
assert unequal_type is not equivalent_requirement
assert annotated_int_array_alt is not annotated_int_array
assert annotated_int_array_alt.dtype is equivalent_requirement

# Check the same logic for a pair of C types whose equivalence may vary
# between computing environments.
# Find an equivalent pair.
integer_type_codes = ('i', 'l', 'q')
integer_dtypes = [np.dtype(code) for code in integer_type_codes]
typeA = None
typeB = None
for typeA, typeB in itertools.permutations(integer_dtypes, r=2):
if typeA == typeB:
assert typeA is not typeB
break
assert isinstance(typeA, np.dtype) and isinstance(typeB, np.dtype)

# These ``asarray()`` calls may produce a new view or a copy,
# but never the same object.
long_int_array = np.asarray(int_array, dtype='l')
long_long_int_array = np.asarray(int_array, dtype='q')
assert long_int_array is not int_array
assert long_long_int_array is not int_array
assert np.asarray(long_int_array, dtype='q') is not long_int_array
array_a = np.asarray(int_array, dtype=typeA)
assert typeA == typeB
assert typeA is not typeB
assert array_a.dtype is typeA
assert array_a is not np.asarray(array_a, dtype=typeB)
assert np.asarray(array_a, dtype=typeB).dtype is typeB
assert array_a is np.asarray(array_a, dtype=typeB).base
23 changes: 21 additions & 2 deletions numpy/core/src/multiarray/multiarraymodule.c
Expand Up @@ -1627,12 +1627,31 @@ _array_fromobject_generic(
goto finish;
}
}
/* One more chance */
/* One more chance for faster exit if user specified the dtype. */
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
oldtype = PyArray_DESCR(oparr);
if (PyArray_EquivTypes(oldtype, type)) {
if (copy != NPY_COPY_ALWAYS && STRIDING_OK(oparr, order)) {
Py_INCREF(op);
ret = oparr;
if (oldtype == type) {
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
ret = oparr;
}
else {
/* Create a new PyArrayObject from the caller's
* PyArray_Descr. Use the reference `op` as the base
* object. */
Py_INCREF(type);
ret = (PyArrayObject *)PyArray_NewFromDescrAndBase(
Py_TYPE(op),
type,
PyArray_NDIM(oparr),
PyArray_DIMS(oparr),
PyArray_STRIDES(oparr),
PyArray_DATA(oparr),
PyArray_FLAGS(oparr),
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
op,
op
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, either I just didn't think of it, or expected that this steals a reference to op for the base. We are missing a DECREF, I will make a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I originally tried PyArray_NewFromDescr with op for *data and nullptr for *obj, and PyArray_SetBaseObject or something that steals the reference for the base. Anyway, I guess we were sloppy when switching to PyArray_NewFromDescrAndBase. Thanks for the quick catch!

}
goto finish;
}
else {
Expand Down
2 changes: 1 addition & 1 deletion numpy/ma/tests/test_core.py
Expand Up @@ -5387,7 +5387,7 @@ def test_ufunc_with_out_varied():


def test_astype_mask_ordering():
descr = [('v', int, 3), ('x', [('y', float)])]
descr = np.dtype([('v', int, 3), ('x', [('y', float)])])
eirrgang marked this conversation as resolved.
Show resolved Hide resolved
x = array([
[([1, 2, 3], (1.0,)), ([1, 2, 3], (2.0,))],
[([1, 2, 3], (3.0,)), ([1, 2, 3], (4.0,))]], dtype=descr)
Expand Down