Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add __array_function__ implementation to MaskedArray #22913

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

greglucas
Copy link
Contributor

In downstream libraries (Matplotlib), we frequently want to do array manipulation on whatever type is passed in. xy = np.stack([x, y]) where x and y could be ndarray or MaskedArray. In certain places we handle this with stack = np.ma.stack if isinstance(x, MaskedArray) else np.stack, but it isn't a very robust solution. Now that numpy as __array_function__, we can handle the implementation within the MaskedArray class if the specific function is available.

Previously, calling np.stack(ma, a) with a masked and normal array would stack the nd-data from the items, but wrap it in a MaskedArray class without applying the mask. This implementation intercepts the np.func calls and checks whether there is a compatible masked version that we can use instead. Now, the mask gets propagated with the stack as expected.

import numpy as np

np.append(np.ma.masked_all(2), np.arange(2))
# [-- -- 0.0 1.0]
np.stack([np.arange(2), np.ma.masked_all(2)])
# [[0.0 1.0]
#  [-- --]]

closes #22338
closes #18675
possibly others...

ma_func = np.ma.__dict__.get(function.__name__, function)
# Known incompatible functions
# copy: np.copy() has subok semantics, which we can't pass through
# unique: using ma.view(ndarray) internally doesn't work
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using arr.view(np.ndarray) within unique, but there are some expectations that a "masked" entity be returned. I would actually think that we want ma.unique() to ignore the mask altogether and that isn't one of the unique elements.

For now, this seems easiest to just skip.

incompatible_functions = {"copy", "unique"}
# The masked functions know how to work with ndarray and
# MaskedArrays, but can't guarantee anything about other types
handled_types = {np.ndarray, MaskedArray}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to allow subclasses or not, this is the safest for now... If we remove this limitation, then Astropy MaskedColumn which is a subclass of MaskedArray doesn't like the results.

@charris
Copy link
Member

charris commented Jan 20, 2023

Is this independent of #22914.

@greglucas
Copy link
Contributor Author

Yes, it is independent of the __array_ufunc__ PR and only applies to generic functions that are implemented within the masked namespace (np.func() mapping to np.ma.func() when it exists). However, with the recent discussions about possibly reworking MaskedArray for numpy 2.0, we may or may not want something like this before a larger transition...

@greglucas
Copy link
Contributor Author

@mattip, I just went and took a look at the Numpy 2.0 document thinking about proposing some of this MaskedArray work (__array_function__ here and __array_ufunc__ in #22914) to the roadmap, but I see you've already taken the initiative to add a talk on it which is great, thank you! I am unfortunately busy during that entire meeting, so I won't be able to join. (happy to join follow-up meetings when I'm available) But, I do think this Numpy 2.0 is a good opportunity to update MaskedArray, see here for a comment on some additional ideas: #22914 (comment)

@mattip
Copy link
Member

mattip commented Apr 2, 2023

I started a hackmd document for the lightning talk here. Help fleshing it out would be more than welcome (it should be open for editing)

Previously, calling np.stack(ma, a) with a masked and normal array
would stack the nd-data from the items, but wrap it in a MaskedArray
class without implementing the mask. This implementation intercepts
the np.func calls and checks whether there is a compatible masked
version that we can use instead. Now, the mask gets propagated with
the stack as expected.
The manual masked implementation was largely copied over from the
numpy array implementation just to handle masked concatenate. Now
that we use array function implementations, we get that automatically
with the np.diff() call, so there is no need to re-implement things
in the masked namespace anymore.
@greglucas
Copy link
Contributor Author

I decided to delve back into some of the masked array work again thinking it would still be nice to get some cleanups in for numpy 2.0. I think this one should be much less complicated than the ufunc PR, so I figured we should possibly focus on this before wading back over into the ufunc territory.

As I rebased this there were some new failures that came up that illustrated a clear example where this would have helped. #22776 basically copy-pasted the main array implementation of np.diff() over to np.ma.diff() and used the masked implementation of concatenate. With this PR that is all handled automatically because of the deferral mechanism that the original implementation would have used np.diff() calls np.concatenate() which finds a masked implementation and calls np.ma.concatenate() for the user. (I left this as a separate commit for illustration)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants