Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Zero-dimensional numpy arrays within records decay to scalars #9442

Open
NeilGirdhar opened this issue Jul 20, 2017 · 17 comments
Open

BUG: Zero-dimensional numpy arrays within records decay to scalars #9442

NeilGirdhar opened this issue Jul 20, 2017 · 17 comments
Labels

Comments

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 20, 2017

First, a shape (4,) numpy array within a record can be assigned:

In [33]: a = np.zeros((10,), dtype=[('k', '<u8'), ('t', '<f4'), ('d', np.bool, (4,))])

In [34]: b = np.ones((), dtype=np.bool)

In [35]: a[0][2][3] = b

But, a shape () cannot:

In [36]: a = np.zeros((10,), dtype=[('k', '<u8'), ('t', '<f4'), ('d', np.bool, ())])

In [37]: a[0][2][()] = b
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-075e4ba95b60> in <module>()
----> 1 a[0][2][()] = b

TypeError: 'numpy.bool_' object does not support item assignment

But if it's not in a record, it works just fine:

In [38]: a = np.zeros((), np.bool)

In [39]: a[()] = b
@NeilGirdhar NeilGirdhar changed the title Zero-dimensional numpy arrays within records support don't support item assignment Zero-dimensional numpy arrays within records don't support item assignment Jul 20, 2017
@eric-wieser
Copy link
Member

eric-wieser commented Jul 20, 2017

There're two parts to this. The error you're getting is the same as this one:

b = np.zeros(3, np.bool)
b[0][()] = np.ones((), dtype=np.bool)

Which is because a np.bool_ and a 0d array of np.bool_ are different things. This part is not a bug, and is very much by design


The other problem is that your dtype is being ignored:

>>> a.dtype
dtype([('k', '<u8'), ('t', '<f4'), ('d', '?')])

In general, it seems to be impossible to specify a 0d subdtype:

>>> np.dtype((np.bool_, (2,))).subdtype
(dtype('bool'), (2,))
>>> np.dtype((np.bool_, ())).subdtype
None

This is arguably a bug

@eric-wieser
Copy link
Member

Here's the code that deliberately ignores a shape of ()

@eric-wieser
Copy link
Member

Even removing that doesn't help though, as huge amounts of numpy functions end along the lines of

if arr.ndim == 0:
    arr = arr[()]
return arr

@eric-wieser
Copy link
Member

eric-wieser commented Jul 20, 2017

A workaround would be to index as a['d'][0,...][the_index] = b, which would then work for any shape of field

@NeilGirdhar
Copy link
Contributor Author

Thanks for linking the code. I was going to ask you if I could help, but I'm very busy right now. I may come back to this. I don't understand your workaround in the context of my second definition of a.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 20, 2017

Here it is for each of your two definitions of a:

a['d'][0,...][3] = b
a['d'][0,...][()] = b

The workaround works by first avoiding indexing a np.void, which doesn't support preserving dimension. At any rate, indexing by field name is much more readable.

... in an index means "always return a view, never a scalar"

@NeilGirdhar
Copy link
Contributor Author

That works, but it doesn't make any sense to me 😄

@eric-wieser
Copy link
Member

eric-wieser commented Jul 20, 2017

Step by step:

i = ()  # or 3, in your other example
a[0][2][i]   # original
a[0]['d'][i] # index by field name, not field index
a['d'][0][i] # field name index can go anywhere
a['d'][0,...][i] # adding ... makes this return a 0d array, not an np.bool_

a[0,...]['d'][i] works too

@NeilGirdhar
Copy link
Contributor Author

Got it, thanks.

@eric-wieser
Copy link
Member

To summarize, I think the underlying bug here is a total lack of support for distinguishing 0d fields and scalar fields in a subdtype.

I don't think there's an easy fix, because part of the problem is that subdtypes are not first-class citizens in the dtype world, as they decay very quickly into extra dimensions.

@NeilGirdhar
Copy link
Contributor Author

I don't know enough about how they're implemented, but I guess you can't just remove that condition in the code that you linked?

@eric-wieser
Copy link
Member

eric-wieser commented Jul 20, 2017

I've tried that, and it doesn't help. The next problem is that a['d'] is indistinguishable for dtypes (bool, ()) and bool, because of how subdtype expansion works - in general, a['field'].shape == a.shape + a.dtype['field'].shape, but a['field'].dtype.shape == ()

@hpaulj
Copy link

hpaulj commented Jul 20, 2017

I think that subdtype expansion is an important consideration. Such an expansion is one of the most common ways of using a structured array. Most of the recfunctions work by copying data from one array to another by field name. It would be difficult to define an expansion (and its indexing) that distinguishes between scalar and 0d fields, and at the same time remains consistent with 1d and higher dimensional fields.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 20, 2017

The type of (compatibility-breaking) change I'm envisaging is:

>>> a_dt = np.dtype([('v', int, (3,))])
>>> a = np.empty(4, a_dt)

>>> x = a['x']  # new behaviour
>>> x.dtype
dtype(('<i4', 3))
>>> x.shape
(4,)

>>> x_old = x.view(int)  # workaround to regain the old behaviour
>>> x_old.dtype
dtype('<i4')
>>> x_old.shape
(4,3)

Of course, that's the type of change we can never make, unless we start using context managers to enable new semantics

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Jul 20, 2017

You could keep a list of all of the breaking changes you would like to make, and then if that list gets long enough, one day, implement that context manager? (Because I agree, this is not super-motivating.)

@eric-wieser
Copy link
Member

Actually, it turns out that context managers over global state are not a safe way to change semantics: #9444

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Jul 20, 2017

That's fascinating. I don't know the answer. Please consider posting this to python-ideas to start a discussion about how this is supposed to work.

@eric-wieser eric-wieser changed the title Zero-dimensional numpy arrays within records don't support item assignment BUG: Zero-dimensional numpy arrays within records decay to scalars Sep 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants