deprecate scalar conversions for rank>0 arrays #10404

nschloe · 2018-01-14T19:27:35Z

Numpy allows for conversion of arrays into scalars if they are size-1, i.e.,

float(numpy.array(1.0))  # 1.0
float(numpy.array([1.0]))  # 1.0
float(numpy.array([[[[1.0]]]]))  # 1.0

# TypeError: only size-1 arrays can be converted to Python scalars
float(numpy.array([1.0, 1.0]))

I'm fine with rank-0 conversions, but I found that discrimination of conversion based on array size can easily lead to inconsistencies downstream. See, for example, sympy/sympy#13924 where suddenly you have a different behavior of multiplication depending on the length of the input array.

Moreover, this feature is actually redundant for rank>0 arrays since the value can simply be retrieved from the array via x[0].

When grepping for "converted to Python scalar" in the numpy sources, one finds that numpy/lib/user_array.py has

"only rank-0 arrays can be converted to Python scalars."

which seems to be the more consistent solution.

I would hence like to suggest to deprecate scalar conversion for rank>0 size-1 arrays.

The text was updated successfully, but these errors were encountered:

seberg · 2018-01-14T21:00:18Z

Personally agree with you (there is a reason we got rid of the operator.index equivalent thing, where it was causing obvious problems) and would encourage anyone to try it.
But… I expect there are some annoying cases to solve and it might have a pretty large downstream impact which would make it a very slow change indeed (I am not sure, I think I might have tried it at some time way back, heh).

eric-wieser · 2018-01-15T04:34:47Z

or rank-1 and size-1

Not even - size == 1 is all that is checked:

>>> float(np.array([[[1]]]))
1.0

Personally agree with you

Me too. The deprecation warning could stick around for a very long time, and making noise downstream seems pretty reasonable if that noise identifies their bugs.

njsmith · 2018-01-15T04:39:52Z

I agree with @seberg -- this change would indeed make the API cleaner, but I'm dubious about whether we can actually pull it off or the pain would be worth the benefit. The next step would be to try making the change and see how much stuff breaks (e.g. by running the test suites for big downstream projects like scipy, scikit-learn, astropy, ...). That'd at least give us some data to discuss.

nschloe · 2018-01-15T09:12:00Z

Thanks for the comments. I've adapted to original post to reflect the fact that arrays of larger rank can be converted to floats as well.

I understand that it can be painful to deprecate stuff, particularly if its not clear how much a functionality is actually used downstream. I would suggest that I make the change locally and compile scipy, sympy, pandas, scikit-learn, and astropy (any other suggestions?) against it and report back with the experience.

nschloe · 2018-01-16T10:49:59Z

Alright, I've now dug into scipy and this is what I found:

The use of explicit int/float conversions on size-1 arrays is not that common. In replacing it with the proper expressions, I've even discovered a small bug in scipy.
When disallowing float conversions on size-1 arrays, the following code is illegal:
```
a = numpy.random.rand(10, 1)
a[0] = numpy.array([1.0])
# ValueError: setting an array element with a sequence.
```
While arguably unclean (you should just set a[0] = 1.0), the use of that is more common. This probably constitutes a roadblock for the suggested change.

I'll leave it up to you to comment and/or close this issue.

shoyer · 2018-01-22T04:19:06Z

@nschloe I don't follow why that second case becomes illegal. It seems like that should be well defined, since the array a[0, :] has shape (1,). Maybe that was a bug in your implementation, possibly due to preexisting assumptions in Numpy?

nschloe · 2018-01-22T08:17:07Z

I don't follow why that second case becomes illegal.

@shoyer I'll try to explain why I think it failed. The array element a[0] in

a = numpy.random.rand(10, 1)
a[0] = numpy.array([1.0])

is a scalar (i.e., empty shape (,)), and you're trying to set it with an object of shape (1,). For this to work, the rhs needs to be converted to a scalar (automatically). If scalar conversion is disabled, you'll get an error.

eric-wieser · 2018-01-22T08:45:44Z

@nschloe: That might be considered a sort of reverse broadcasting, which currently succeeds in some cases but not others:

a = np.zeros((1, 3))
b = np.empty((3,))
b[...] = a # leading 1 dimensions are removed
b[...] += a  # but this fails!

Thinking again, allowing a size == 1 array decay to a scalar seems pretty consistent with broadcasting rules in general, to me - if () can magically promote to (1, 1, 1, 1), then is it any worse to allow (1, 1, 1, 1) to decay to ()?

eric-wieser · 2018-01-22T08:59:17Z

@nschloe: in your branch, does a[0, ...] = np.array([0]) succeed?

seberg · 2018-01-22T09:06:45Z

Yeah… well the lhs broadcasting for singleton assignments was what I was thinking of. Not sure if there is more, but if it is the main point, it should not be all that bad…. Eric, hehe, that should succeed, but without checking I am not actually sure anymore if it sidesteps and just does item assignment currently, though I doubt it. There may be a bit annoyance with object arrays, where you never do this for item assignment, since you assume that the array should be the object. But that wouldn't even have to change.

eric-wieser · 2018-01-22T09:35:10Z

There may be a bit annoyance with object arrays

I think that collectively, we've hit the nail on the head there - reverse broadcasting has no place in scalar assignments, because it just doesn't generalize to object arrays. I think that

a[0] = some_1d_len1_array

ought to be forbidden, since its semantics are dtype dependent - code written this way for numeric arrays will already fail in unexpected ways for object arrays. If the reverse-broadcasting is desired by the user, then they should be forced to write

a[0, ...] = some_1d_len1_array

which does the same thing for all dtypes. Making them write it this way eliminates a class of bug, which is exactly what we should be aiming to do with DeprecationWarnings

Whether reverse broadcasting is actually desirable is also perhaps questionable, but you've made it clear to me that it's an orthogonal issue.

seberg · 2018-01-22T14:01:01Z

Yeah, its orthogonal. If you attempt this, you basically have to put into place the stuff that allows to deprecate the other, too. But that is all there is to it probably.

EDIT: OK it is not quite orthogonal, since for a bit the behaviour between some array likes would differ.

njsmith · 2018-01-22T17:47:44Z

I think @shoyer's point was that if a has shape (10, 1), then a[0] should expand to a[0, ...] and have shape (1,), so it's not a scalar.

eric-wieser · 2018-01-22T18:09:55Z

@njsmith: I mistakenly read that as shape (10,)

nschloe · 2018-02-16T13:31:34Z

I've finally gotten around to setting up a branch for this, master...nschloe:bug-10404. Here are the results:

import numpy

a = numpy.random.rand(10, 1)
a[0] = numpy.array([1.0])       # okay
a[0] = numpy.array(1.0)         # okay
a[0] = 1.0                      # okay

b = numpy.random.rand(10)
b[0] = numpy.array([1.0])       # ValueError: setting an array element with a sequence.
b[0, ...] = numpy.array([1.0])  # okay
b[0] = numpy.array(1.0)         # okay
b[0] = 1.0                      # okay

This all makes sense, and the error message is accurate too. Instead of recommending to insert ... (as @eric-wieser suggests), I find the simple b[0] = 1.0 more appropriate. Perhaps there are cases where ... is actually needed though.

Yeah, so this is a breaking change that would need a long way of deprecation first. Should I make the ValueError a deprecation and submit a PR where we can discuss further, or are there other suggestions?

seberg · 2018-02-16T19:27:56Z

Nothing wrong with PRs in any case, we can always close them ;). It might be a very long and possibly painful deprecation, unfortunately. About the assignment thing, currently that type of broadcasting is also supported for non-scalar LHS, which may or may not be something to look into/synchronize.

The problem with these things is that often you only know how bad this kind of change is for downstream until they start seriously using it.

eric-wieser · 2018-02-16T20:17:42Z

If you make a PR we can at least identify where we rely on this internally

nschloe · 2018-02-16T22:24:52Z

➡️ PR #10615.

EricCousineau-TRI · 2018-05-11T09:22:54Z

As another pain point, this behavior is a tad confusing when combined with pybind11, where you can define overloads.
When pybind11 evaluates its overloads, it first does a "no convert" pass, where it attempts to convert from Python to C++ with no (or very little) implicit conversion, passing over each overload. If this fails, then it passes over the same overload set, this time permitting implicit conversions. This overload behavior makes things great when you want to distinguish between int and float overloads, but it makes things painful when NumPy can implicitly convert its arrays to scalars, but lists themselves to do not get collapsed to scalars.

I ran into this when trying to write overloads for a function, say func, for argument types np.array[double] (returns "Vector") and int (returns "Int"). (Was actually trying to use this where one call constructed from an array, while another was sugar for creating a fresh array of a given size). Some example runs:

func(0) -> "Int"  # Good.
func([0.])  -> "Vector"  # Great.
func(np.array([0.]))  -> "Int"  # What???

I originally ran into this when writing a custom dtype, but found this was the behavior for floats too.

I will file a pybind11 issue as well, with a workaround that is specific to pybind11, but it would have been nice if NumPy were a tad more strict on how it treats scalars vs. single-item arrays.

pv · 2019-11-30T13:56:52Z

The scalar conversion of size-1 arrays I think is a part of Numpy semantics that is quite central, and I believe it's not a small amount of code that assumes it (scipy/scipy#11147 also seems to suggest). IMO, this would be more like Numpy 2.0 stuff.

eric-wieser · 2019-11-30T14:19:26Z

IMO, this would be more like Numpy 2.0 stuff.

Perhaps there's a compromise where we can still deprecate it in 1.x, but it's a far longer deprecation period than normal - perhaps PendingDeprecationWarning would be appropriate here

seberg · 2019-11-30T18:09:37Z

@eric-wieser how are we going to deprecate/futurewarn though? Return new scalars, which warn when in-place operations happen, or when someone tries to hash them? I suppose that is actually possible, but I guess it might be very noisy. OTOH, considering how much I dislike this behaviour, maybe there is a point in trying it out. Even if we would hide it behind some env variable for now.

eric-wieser · 2019-11-30T18:46:32Z

Isn't it as simple as replacing the valueerror in the linked PR with a warning?

pv · 2019-11-30T19:00:35Z

I don't think the length of the deprecation period matters here --- either the API is kept stable or not. The arguments what is gained by this seem not so clearly considered above (or at least I didn't get them by reading it twice :): Unlike with strict-casting and float indices, it's not as clear presence of this coercion in code would usually be a bug (only sympy and pybind11 corner cases are mentioned above, not sure if the object array assignment is the same).

eric-wieser · 2019-11-30T19:10:22Z

The way I see it is:

The current behavior hides potential bugs (which I agree we need to expand upon before actually deprecating this, preferably in the release notes)
It is always possible to write code that does not rely on this behavior, in all remotely recent versions of numpy
Even if libraries like scipy do not have any bugs caused by this coercion, it is likely that they expose a similar class of pitfall to their callers
Users can always silence warnings

either the API is kept stable or not

Perhaps then this should be a UserWarning instead - indicating that the api is unlikely to change, but the user is nonetheless advised not to do what they're doing.

nschloe · 2019-12-01T15:10:01Z

Perhaps then this should be a UserWarning instead -

👍

I think there no need for hard breaking people's code so you're right, a deprecation is perhaps a bit harsh. However, when using float conversion of rank>0 arrays, you're almost always making a mistake in how you treat your arrays. A typical case would be returning [val] from an objective function for optimization which really should be a scalar. Right now, numpy irons out that mistake for you, but it is my experience that your code quality improves when dealing with it (e.g., scipy/scipy#11147).

More typical anti-patterns caught by this change:

import numpy
a = numpy.array([1.0e-1], dtype=numpy.float16)
# ....
val = float(a)   # you want a[0]

import numpy
p = numpy.array([3.14])
a = numpy.array([1.0, 2*p, 1/p])  # whoops: a is of type obj
#
b = numpy.zeros((10, 3))
b[0] = a   # !

pv · 2019-12-01T15:18:57Z

float(a) and a[0] mean different things --- the former is an explicit scalar and type cast, whereas the latter extracts (a potentially >0-dim) sub-array from an array, and these are not exchangeable. Safe scalar coercion can be done with a.reshape([]) ... but that's quite ugly.

nschloe · 2019-12-01T15:29:45Z

float(a) and a[0] mean different thing

Indeed. And until now, in many cases where the user should have used a[0], you'll find float(a)/int(a) (even scipy). Now you get a warning for that.

eric-wieser · 2019-12-01T15:34:25Z

@nschloe, I think your scipy PR might be fixing the symptom and not the cause - for anyone else reading along, I've been trying a different approach in scipy/scipy#11158.

pv · 2019-12-01T20:03:29Z

In scipy/scipy#11147, majority of the changes do not appear to be actually bugs --- most of them are extracting a Python scalar of given type from an array known to have a size 1 (with exception if not), i.e. float(x) as short for float(x.item()).

seberg · 2019-12-02T14:59:09Z

In principle arr.item() means something very similar. Although, I think it always defaults to .item(0) and will not notify that the array has additional dimensions. The main issue with float() is that it uses reversed broadcasting rules. Also we already deprecated this for __nonzero__ due to the problems of inconsistent indexing.
I agree that there is lots of correct usage out there though. Similar to squeeze, although it has problems, it is often used correctly probably.

This PR reflects some of the progress achieved in issue #10404 and is used to asses the impact of the changes. With the changes in this PR, `float(numpy.array([1.0])` now gives a warning; likewise some other things: ```python import numpy a = numpy.random.rand(10, 1) a[0] = numpy.array([1.0]) # okay a[0] = numpy.array(1.0) # okay a[0] = 1.0 # okay b = numpy.random.rand(10) b[0] = numpy.array([1.0]) # ValueError: setting an array element with a sequence. b[0, ...] = numpy.array([1.0]) # okay b[0] = numpy.array(1.0) # okay b[0] = 1.0 # okay ``` This aligns the behavior of numpy arrays with that of lists: ```python float([3.14]) ``` ``` TypeError: float() argument must be a string or a number, not 'list' ``` ```python import numpy as np a = np.random.rand(5) a[0] = [3.14] ``` ``` ValueError: setting an array element with a sequence. ``` Fixes #10404.

seberg · 2023-04-20T16:02:27Z

I have decided to give the deprecation a shot. I think the situation is a bit better now (Scipy, pandas, matplotlib should be pretty much clean). Also the changes in SciPy were by far the largest, which gives me hope that downstream libraries are not as badly affected.

That said, I do suspect we should take this slow, and I am happy to revert if we notice that it is painful for downstream/end-users. (Or anyone here has concerns.)

rgommers · 2023-04-20T16:12:20Z

Thanks! I like this approach, hopefully will not be too bumpy of a ride.

josef-pkt · 2023-04-22T19:53:14Z

This looks pretty horrible for statsmodels, we need to add a million squeeze to 1-d, one element arrays.
statsmodels/statsmodels#8821 (comment)

Does the same occur when assigining e.g a 2dim row to a 1-dim row slice of a 2dim array?
(I don't know whether statsmodels has cases where we have an extra dim in higher dim assignments.)

(Aside: the current test failures with this are only in 2 unit tests that check that no warning is raised)

kalvdans · 2023-05-09T14:53:57Z

I tried the patch on our numpy-heavy codebase at my company, by entering
numpy @ git+https://github.com/numpy/numpy@main in requirements.txt. Of the unit tested 11065 code-lines, 12 lines needed to change. Most places were code that had a index remapping array, like linestrengths[elements == Z] where Z is the atomic number we are interested in and elements is a list of atomic numbers, the same length as the linestrengths array. This expression returns a one-element vector that were later used as a scalar.

nschloe · 2023-05-09T15:23:06Z

@kalvdans Thanks for sharing! Having to explicitly convert the expression to a scalar makes sense to me. After all, the result of the operation could theoretically have any size.

idx = elements == Z
assert np.sum(idx) == 1
idx = idx[0]

res = linestrengths[idx]

– Explicit is better than implicit

numpy/numpy#10404

* Fix for NumPy 1.25 * Revert #6973 * Revert "Don't test numpy prerelease on azure (#6996)" This reverts commit e57c23d. * Skip warning check on older, noisier versions of NumPy * Remove upperpin * Use stable sort in _raveled_offsets_and_distances NumPy enabled optimizations for quicksort on AVX-512 enabled processors [1]. We sort based on neighbor distances which can be equal. The default quicksort is not stable and could have different behavior on AVX-512. Test this assumption. [1] https://numpy.org/devdocs/release/1.25.0-notes.html#faster-np-argsort-on-avx-512-enabled-processors * Fix forgotten doctest in _offsets_to_raveled_neighbors after updating to stable sort internally. * Fix for NumPy 1.25 * Revert #6973 * Revert "Don't test numpy prerelease on azure (#6996)" This reverts commit e57c23d. * Skip warning check on older, noisier versions of NumPy * Remove upperpin * Use stable sort in _raveled_offsets_and_distances NumPy enabled optimizations for quicksort on AVX-512 enabled processors [1]. We sort based on neighbor distances which can be equal. The default quicksort is not stable and could have different behavior on AVX-512. Test this assumption. [1] https://numpy.org/devdocs/release/1.25.0-notes.html#faster-np-argsort-on-avx-512-enabled-processors * Fix forgotten doctest in _offsets_to_raveled_neighbors after updating to stable sort internally. * Handle rank>0 scalar arrays numpy/numpy#10404 * Use stable sort * Use stable sort in max_tree (debug) See if it does anything. * Use stable sort in _get_high_intensity_peaks (debug) See if it does anything. * Upper pin numpy<1.26 for tests --------- Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu> Co-authored-by: Lars Grüter <lagru@mailbox.org>

eric-wieser added the 07 - Deprecation label Jan 15, 2018

nschloe changed the title ~~deprecate scalar conversions for rank-1 arrays~~ deprecate scalar conversions for rank>0 arrays Jan 15, 2018

nschloe mentioned this issue Feb 16, 2018

DEP: deprecate scalar conversions for arrays with ndim > 0 #10615

Merged

ghost mentioned this issue Apr 5, 2018

Consider deprecating bool(arr) when arr.shape != () #9885

Open

EricCousineau-TRI mentioned this issue May 11, 2018

numpy: Overloads between scalars and arrays can produce confusing results pybind/pybind11#1392

Open

NeilGirdhar mentioned this issue May 15, 2018

Symbol violates the commutativity of addition when it sometimes converts one-element arrays to scalar sympy/sympy#14706

Closed

shoyer added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label May 10, 2019

pv mentioned this issue Dec 1, 2019

WIP: Remove 0d/1d special-casing from _numdiff scipy/scipy#11158

Closed

nschloe mentioned this issue Dec 9, 2019

BUG: dimension discovery fails when mixing scalars and shape==(1,) arrays #15075

Closed

smichr mentioned this issue Feb 20, 2020

Fix for numpy arrays when the size 1 arrays were treated as scalars sympy/sympy#18651

Merged

asmeurer mentioned this issue Feb 15, 2021

Add __int__, __float__, and __bool__ to the array object data-apis/array-api#127

Merged

Micky774 mentioned this issue Jun 24, 2022

DEP: deprecate scalar conversions for arrays with ndim>0 #21846

Closed

asmeurer mentioned this issue Oct 7, 2022

Add __complex__ to array object data-apis/array-api#486

Closed

seberg closed this as completed in #10615 Apr 20, 2023

josef-pkt mentioned this issue Apr 22, 2023

BUG/Maint: numpy additional pre-failures statsmodels/statsmodels#8821

Closed

agriyakhetarpal mentioned this issue Jun 20, 2023

[Bug]: Conversion to scalars for arrays with ndim>0: NumPy 1.25 deprecation (Python 3.9–3.11) pybamm-team/PyBaMM#3052

Closed

jarrodmillman added a commit to jarrodmillman/scikit-image that referenced this issue Aug 14, 2023

Handle rank>0 scalar arrays

20b9a56

numpy/numpy#10404

swryan mentioned this issue Sep 22, 2023

Cleaned up some deprecations in the OpenMDAO code base OpenMDAO/OpenMDAO#3035

Merged

deprecate scalar conversions for rank>0 arrays #10404

deprecate scalar conversions for rank>0 arrays #10404

Comments

nschloe commented Jan 14, 2018 • edited

seberg commented Jan 14, 2018

eric-wieser commented Jan 15, 2018 • edited

njsmith commented Jan 15, 2018

nschloe commented Jan 15, 2018

nschloe commented Jan 16, 2018

shoyer commented Jan 22, 2018

nschloe commented Jan 22, 2018

eric-wieser commented Jan 22, 2018 • edited

eric-wieser commented Jan 22, 2018

seberg commented Jan 22, 2018 via email

eric-wieser commented Jan 22, 2018 • edited

seberg commented Jan 22, 2018 • edited

njsmith commented Jan 22, 2018

eric-wieser commented Jan 22, 2018

nschloe commented Feb 16, 2018

seberg commented Feb 16, 2018

eric-wieser commented Feb 16, 2018

nschloe commented Feb 16, 2018

EricCousineau-TRI commented May 11, 2018

pv commented Nov 30, 2019

eric-wieser commented Nov 30, 2019 • edited

seberg commented Nov 30, 2019

eric-wieser commented Nov 30, 2019 • edited

pv commented Nov 30, 2019 • edited

eric-wieser commented Nov 30, 2019

nschloe commented Dec 1, 2019

pv commented Dec 1, 2019 • edited

nschloe commented Dec 1, 2019 • edited

eric-wieser commented Dec 1, 2019

pv commented Dec 1, 2019 • edited

seberg commented Dec 2, 2019

seberg commented Apr 20, 2023

rgommers commented Apr 20, 2023

josef-pkt commented Apr 22, 2023

kalvdans commented May 9, 2023

nschloe commented May 9, 2023

nschloe commented Jan 14, 2018 •

edited

eric-wieser commented Jan 15, 2018 •

edited

eric-wieser commented Jan 22, 2018 •

edited

eric-wieser commented Jan 22, 2018 •

edited

seberg commented Jan 22, 2018 •

edited

eric-wieser commented Nov 30, 2019 •

edited

eric-wieser commented Nov 30, 2019 •

edited

pv commented Nov 30, 2019 •

edited

pv commented Dec 1, 2019 •

edited

nschloe commented Dec 1, 2019 •

edited

pv commented Dec 1, 2019 •

edited