Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantity-aware frompyfunc #11893

Closed
wants to merge 13 commits into from
Closed

Conversation

nstarman
Copy link
Member

@nstarman nstarman commented Jun 25, 2021

Getting numpy.frompyfunc to work with Quantities
Very much WIP.

In particular, I'm trying to figure out:

  • how to enable unit pass-through. The helpers seem to always require pre-specifying a return unit. For a generic function lambda x: x, that's not correct.
  • how to get unit-ful return objects to turn from an array of Quantity into a Quantity array. The following code works on the output, q = u.Quantity(result.flat); q.shape = result.shape, but this should be applied automatically. While this could be done with a wrapper around the ufunc, that partially defeats the purpose of a ufunc (no ufunc.reduce, etc.)

Request Collaboration: @mhvk

numpy.frompyfunc Examples

>>> x = np.arange(0, 6, 2).reshape(3, 1)
>>> def func(x): return x
>>> ufunc = u.utils.quantity_frompyfunc(func, 1, 1, inunits=u.deg, ounits=u.deg)
>>> ufunc(x * u.deg)
[[0.0], [2.0], [4.0]]∘

But it's always degrees...
The inunits means it knows about input & output units.

>>> try: ufunc(x * u.one); except: print("unit problem")
unit problem

>>> ufunc((x * u.deg).to(u.rad))
[[0.0], [2.0], [4.0]]∘

unless x doesn't have units...

>>> x = np.arange(0, 6, 2).reshape(3, 1)
>>> ufunc(x)
array([[0],
       [2],
       [4]], dtype=object)

This is also not ideal

>>> def func(x): return x << u.deg
>>> ufunc = u.utils.quantity_frompyfunc(func, 1, 1, ounits=u.deg)  # DANGEROUS! always degrees
>>> x = np.arange(0, 6, 2).reshape(3, 1)
>>> ufunc(x)
array([[<Quantity 0. deg>],
       [<Quantity 2. deg>],
       [<Quantity 4. deg>]], dtype=object)

but can be corrected with q = u.Quantity(result.flat); q.shape = result.shape (u.Quantity(result) doesn't work. I think this may be a bug.)

One cool thing I did is enable a little signature introspection...

>>> def func(x: u.deg) -> u.rad: return x
>>> ufunc = u.utils.quantity_frompyfunc(func, 1, 1)  # don't need to specify ounits!
>>> ufunc(x)
[[0.0] [0.03490658503988659] [0.06981317007977318]] rad

numpy.vectorize is basically just a wrapper for frompyfunc, so getting the latter to work is definitely the first step.

@github-actions github-actions bot added the units label Jun 25, 2021
@github-actions
Copy link

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@mhvk
Copy link
Contributor

mhvk commented Jun 26, 2021

@nstarman - I like very much where this is going! But let me try to get myself up to speed by thinking through clearly what expected API would be. I think there are really two cases of interest:

  1. np.frompyfunc - this creates a ufunc which calls out to python for a function that presumably is very calculation intensive, broadcasting as necessary. Treating the input function as equivalent to the inner function in a ufunc, the base function would just takes numbers and be unaware of units (or, rather, assume the numbers to be in some unit), but we want to register the ufunc in UFUNC_HELPERS. So, what this needs is a helper routine, which takes (func, *units) and should return a list of converters plus an output unit (where the latter can depend on the input units; e.g., helper_multiplication). To me, it would make most sense to have something like a new register_ufunc function, which would just be a thin wrapper around setting an item in UFUNC_HELPERS (e.g., making it easier to write the case where input units are known). And then your utils.frompyfunc could use that.
  2. np.vectorize - for scalars this just uses frompyfunc, so perhaps that gets covered semi-automatically, but it does not use frompyfunc when wrapping a "generalized ufunc" (such as in convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)')), which is perhaps the more interesting case - at least, functions like np.convolve already can handle Quantity and it would be nice if a vectorized version similarly could handle them - by inspecting the output of the first iteration. (This can likely be solved upstream, I'll see if I can get a PR together.)

@mhvk
Copy link
Contributor

mhvk commented Jun 26, 2021

See numpy/numpy#19356 for the np.vectorize-gufunc case.

@mhvk
Copy link
Contributor

mhvk commented Jul 6, 2021

The numpy fix to vectorize is merged, so one can play around with that - if it works, we can backport the changes locally for older numpy versions (we've done that before).

@nstarman
Copy link
Member Author

nstarman commented Jul 6, 2021

Thanks! sounds great.

@nstarman
Copy link
Member Author

nstarman commented Jul 7, 2021

I would have to manually compile numpy to test this? Or wait for https://anaconda.org/scipy-wheels-nightly/numpy to update

@mhvk
Copy link
Contributor

mhvk commented Jul 7, 2021

Indeed - and "nightly" is more like "weekly". But installation is not particularly difficult, at least if you already have everything set up for installing astropy, what should just work is

pip install git+https://github.com/numpy/numpy.git#egg=numpy

(in a virtual environment)

@nstarman
Copy link
Member Author

nstarman commented Jul 8, 2021

np.frompyfunc - this creates a ufunc which calls out to python for a function that presumably is very calculation intensive, broadcasting as necessary. [...] And then your utils.frompyfunc could use that.

@mhvk
I think I'm approaching this. I've fixed the input units and conversions and implemented your suggestion of a function register_ufunc. I've updated the opening comment to reflect progress made. The outstanding issues (before I worry about implementation details) is how to fix unitless input and unitful output. These seem to originate more deeply in the numpy override magic.

@mhvk
Copy link
Contributor

mhvk commented Jul 8, 2021

@nstarman - I don't think you can really fix the no-units-on-input, units-on-output case given the current __array_ufunc__ design - there simply is no way for the ufunc to know you want units unless you pass in at least one Quantity (which can be the output, but still the user has to pass in that output). It is the same for, say, np.add. So, for those cases users should put in dimensionless Quantity.

On the implementation, currently, you assume that if no unit is given on the input, then it should just take it as being the standard unit. To match what is done for other ufuncs and in quantity_input, I would make the default be that no unit means dimensionless, but perhaps give an option to have it mean already the right unit.

Basically, I think it is good practice not to mix Quantities too much with regular arrays, unless those are assumed to be dimensionless. And here sticking with stricter, at least initially, keeps the implementation simpler.

@nstarman nstarman added this to Vectorization in Cosmology, the Expansion Jul 14, 2021
@nstarman nstarman added this to the v5.0 milestone Jul 20, 2021
@pep8speaks
Copy link

pep8speaks commented Jul 21, 2021

Hello @nstarman 👋! It looks like you've made some changes in your pull request, so I've checked the code again for style.

There are no PEP8 style issues with this pull request - thanks! 🎉

Comment last updated at 2021-10-12 02:04:28 UTC

@nstarman nstarman force-pushed the units_np-vectorize branch 2 times, most recently from 7b2353c to f1b9e51 Compare July 21, 2021 01:01
@nstarman nstarman changed the title WIP quantity vectorize and frompyfunc WIP quantity frompyfunc Jul 21, 2021
@nstarman nstarman moved this from Vectorization to In progress in Cosmology, the Expansion Jul 21, 2021
@nstarman nstarman requested a review from mhvk July 21, 2021 17:35
@nstarman
Copy link
Member Author

nstarman commented Jul 21, 2021

@mhvk
Looking at the CI, the tests fail because TestUfuncHelpers.test_coverage checks the registered ufuncs and my tests make some... Any suggestions how to get the test to pass? Should register_ufunc have a 2nd registry "REGISTERED_UFUNCS" or something that can be added to assert (all_q_ufuncs - all_np_ufuncs - all_erfa_ufuncs - set(REGISTERED_UFUNCS .keys()) == set())

@nstarman
Copy link
Member Author

Some todos:

  1. I think this frompyfunc should be publicly scoped and in the docs
  2. same with register_ufunc for working with 3rd party ufuncs
  3. changelog
  4. what's new ?
  5. Followup: use these funcs
  6. Followup: make a function isunitlike()

@nstarman nstarman force-pushed the units_np-vectorize branch 3 times, most recently from 39ad355 to 8aadf49 Compare July 21, 2021 18:10
@mhvk
Copy link
Contributor

mhvk commented Jul 21, 2021

@nstarman - little time now but my first instinct is to ensure that the tests do not change global state (i.e., have a teardown function, or use monkeypatch to remove the custom function at the end of the test). That said, I agree that the ufunc completeness test should not count such functions - we don't want astropy.test to depend on what users did beforehand. I also agree we should make public how to register functions, but exactly how best to do that will require a bit of thought. Could you raise a separate issue?

@nstarman
Copy link
Member Author

Could you raise a separate issue?

I'll do that after this is merged. I made a card at https://github.com/astropy/astropy/projects/7#card-65426157

@nstarman nstarman force-pushed the units_np-vectorize branch 2 times, most recently from 5e58feb to 35d02cd Compare July 23, 2021 15:52
@nstarman nstarman force-pushed the units_np-vectorize branch 2 times, most recently from 641999d to 97ee653 Compare August 15, 2021 05:01
@nstarman nstarman marked this pull request as ready for review August 15, 2021 06:03
@nstarman
Copy link
Member Author

ping @mhvk. All tests are now passing. Finally figured out the problem in the parallel tests.

@nstarman nstarman changed the title WIP quantity frompyfunc quantity-aware frompyfunc Aug 15, 2021
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nstarman - some initial comments.

I like the idea in principle, but I wonder what is best to expose. I first thought we should only expose a new u.frompyfunc and not even refer to register_ufunc, but thinking more about it, that actually seems backwards: register_ufunc is missing much more! E.g., we do not cover all the scipy.special functions, and should really give users the option to just register those if they need them! If register_ufunc is properly exposed, do we still need u.frompyfunc? I feel it becomes very strange.

Another larger-picture question is now at frompyfunc, but really belongs to register_ufunc - I think ideally we allow just passing in a helper.

Another bigger point, I'm not sure about the introspection of units, since for quantity_input, the presence of those unit annotations mean something completely different. I feel if you have access to the function and can define it with units, you should just write it for Quantity in the first place - there would be no point to frompyfunc in that case (or, rather, the u.frompyfunc would assume it is a scalar function that takes Quantity).

(Sorry for the abbreviated review - have to cook dinner here!)


def teardown_class(self):
# restore states
ADDED_NARG_UFUNCS = self.ORIGINAL_REGISTERED_NARG_UFUNCS - qh.REGISTERED_NARG_UFUNCS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks the wrong way around; I'd expect qh.REGISTERED_NARG_UFUNCS - self.ORIGINAL_REGISTERED_NARG_UFUNCS

This test is only run on the scalar inputs.
"""
got = func(*inp)
# need to convert from an object array to float array, for comparison
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the need for the comment. Outdated? Or because of assert_allclose rather than assert_allequal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yes. This applied to the commented out line below, which ended up being outdated when I switched to assert_allclose. The previous implementation did not work with an object array.

def test_has_units_assumed_correct(self, func, inp, res):
"""
Test unitful input has unitful output and units are assumed to be
correct if not given. This only applies."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applies to what?

Test unitful input has unitful output and units are assumed to be
correct if not given. This only applies."""
# partial unit assignment to trigger unit assumptions.
# single inputs can't trigger unit assumptions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have an output with units to trigger the helper. But overkill, I think!

correct if not given. This only applies."""
# partial unit assignment to trigger unit assumptions.
# single inputs can't trigger unit assumptions.
inp = [inp[0] * u.km, *inp[1:]] if len(inp) > 1 else inp * u.km
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to create a non-list which you then expand below. Isn't the first part just correct for all cases?

@@ -0,0 +1 @@
Add quantity-aware version of ``numpy.frompyfunc``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not tell how to access it! I think it should be u.frompyfunc


`~numpy.ufunc`s operate on only recognized `~numpy.dtype`s (e.g. float32),
so units must be removed beforehand and replaced afterwards. Therefore units
MUST BE KNOWN a priori, as they will not be propagated by the astropy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This not really true: for many ufuncs the input units can vary (say, np.add or np.multiply) and any ufunc has a helper that returns converters, but that helper has access to the actual input units.

Indeed, given this, my sense is that it should be possible to pass in an explicit helper function, but that for convenience it is also possible to just pass in the units here. Might it be an idea to have a helper= argument that is either a callable, or a tuple with (lists of) input and output units?

inunits, ounits : unit-like or sequence thereof or None (optional)
Sequence of the input and output units, respectively.

.. warning::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a warning is appropriate here. Rather, the docstring should explain at the top that the function itself should not work on quantities. And here, maybe be clear that the inputs are converted to those units and then the value is taken, while for the output the units are attached to the output array (probably easiest if the two are dealt with separately)

Also, let's avoid abbreviations, so just outunits or out_units?

returned `~numpy.ufunc`. Outputs will be assigned these units.
Make sure these are the correct units.

identity : object (optional, keyword-only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the identity cannot have units, I think it should stay before inunits, ounits (and like frompyfunc, there should be a * after nout to force keyword arguments)

_not_ equivalent to setting the identity to ``None``, which implies the
operation is reorderable.

assume_correct_units : bool (optional, keyword-only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure about this. I feel that if it is needed, people could just write their own helper. If we do keep it, maybe have it be a more generic default_units? Could be True for inunits, a list of not equal, u.one for dimensionless (default) and False for not allowed.

@nstarman
Copy link
Member Author

nstarman commented Aug 21, 2021

Thanks @mhvk for the detailed review. I'll try to get to this in the next few days. Definitely some points to ponder in the interim.

Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Addressing PR review.

Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
@nstarman nstarman removed this from In progress (PR) in Cosmology, the Expansion Jan 7, 2022
@nstarman
Copy link
Member Author

While this might be generically useful, I don't currently have a need for it in Cosmology. If someone else wants to take over, that'd be great, otherwise I say close?
Thanks @mhvk for the input!

@mhvk
Copy link
Contributor

mhvk commented Jan 18, 2022

I'd tend to agree - I think the first priority would be a general way to register a ufunc. Will let this auto-close just in case we/someone else thinks of something soon.

@nstarman nstarman closed this Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants