Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Optimize and cleanup ufunc calls and ufunc CheckOverrides #15271

Merged
merged 12 commits into from Mar 24, 2021

Conversation

seberg
Copy link
Member

@seberg seberg commented Jan 6, 2020

This is a rather large commit which clean up the ufunc override
code for argument parsing. (I may attempt to split it up)
There are two main things, argument parsing, especially for
reductions is now much faster since METH_FASTCALL is used
(when keyword arguments are being used).
By moving the argument parsing and using more generic code this
also simplifies the ufunc override checking especially when
keyword arguments are present.

Both of this decreases the argument parsing overhead quite a lot
for ufunc calls, especially for small reductions. (Without double
checking, I believe the speedup was up to around 30% for small
reductions.)

The downside is some added/annoyance due to the use of marcos to
support both FASTCALL and no FASTCALL semantics.
As a side note: vectorcall is likely not a huge factor for ufuncs
since it is very common not to use any keyword arguments. OTOH,
code that uses out=... a lot on small arrays should see a nice
difference.


Only the last commit is new, the others are changes from gh-15269 and gh-15270
@mhvk the last commit here is about what you had once looked at for the override cleanup stuff.

@seberg seberg force-pushed the splitup-faster-argparsing branch 2 times, most recently from 383139c to 261f7bf Compare January 7, 2020 05:03
@mhvk
Copy link
Contributor

mhvk commented Jan 7, 2020

Great! Would think it is useful to split up. In particular, the benchmarks should go in first, to establish a baseline.

@charris charris changed the title ENH: Optimize and cleanup ufunc calls and ufunc CheckOverrides WIP, ENH: Optimize and cleanup ufunc calls and ufunc CheckOverrides Jan 8, 2020
Base automatically changed from master to main March 4, 2021 02:04
@seberg seberg force-pushed the splitup-faster-argparsing branch 4 times, most recently from ceb04e3 to f85ae19 Compare March 18, 2021 01:46
@seberg
Copy link
Member Author

seberg commented Mar 18, 2021

Just to note, this one should be almost ready, but I should prod it a little bit, and test against astropy (and maybe dask) to make sure they are fine with the __array_ufunc__ changes (the changes are within the specifications, but if someone breaks those, we may have to help them out a bit).

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a cursory look just to see what was up, and overall it seems a really nice change. I mostly looked at the ufunc code, which is considerably simplified. Great! If it is ready for more detailed review, I can try to make some time (though perhaps just focussing on the ufunc stuff; it is a lot, though I can see it is not so easy to separate).

@@ -41,7 +41,7 @@ struct _tagPyUFuncObject;
*
* For backwards compatibility, the regular type resolution function does not
* support auxiliary data with object semantics. The type resolution call
* which returns a masked generic function returns a standard NpyAuxData
* which returns a masked generic function returns 5015a standard NpyAuxData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops...

numpy/core/src/umath/override.c Show resolved Hide resolved
{
/*
* ufunc.accumulate(a[, axis, dtype, out])
* If the keywords include sign rename to signature. An error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHould be sig, not sign.


/*
* Make use of fastcall. If Python has tp_vectorcall we can use this
* directly, otherwise we have to unpack it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unpacking happens elsewhere, correct?

*
* PyObject *argument1, *argument3;
* int argument2 = -1;
* if (npy_parse_arguments("method", args, len_args, kwnames),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis after kwnames should not be there.

* PyObject *argument1, *argument3;
* int argument2 = -1;
* if (npy_parse_arguments("method", args, len_args, kwnames),
* "argument1", NULL, &argument1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered quite a bit about whether this implementation perhaps should be shared a bit with the PREPARE macro, but in the end it does seem that this is quite simple and logical, keeping keyword names, their actions, and their destinations together, etc. So, I guess I ended with a 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this should be in now, does it still show up in the diff? Let me rebase quickly to make it hopefully go away. (will make a note of the above comments if they disappear though.)

I tried for a while to do that, but I don't think it is possible. IIRC there was a gcc specific solution (not C99) to be able to give a return value and declare a variable.

@seberg seberg force-pushed the splitup-faster-argparsing branch 2 times, most recently from 7ca285b to a701a62 Compare March 18, 2021 20:25
@seberg
Copy link
Member Author

seberg commented Mar 18, 2021

@mhvk yeah, it would be great to move this along. Because I need to attack the ufunc code soon, and separating argument parsing from the rest of the code is pretty much a prerequisite to refactor more.

It seems I messed up with the other PRs (picking a wrong commit during rebase), the whole asarray stuff should be part of a separate PR (technically, I should not need to include that here, so let me try to rebase it away so this is less noisy for you).

@seberg seberg marked this pull request as ready for review March 18, 2021 20:39
@seberg
Copy link
Member Author

seberg commented Mar 18, 2021

Rebased with only the ufunc related changes. I will put further fixups into their own commits for clarity (I don't really expect much, but maybe an occasional refcount problem in override-paths or so).

@seberg seberg added 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes and removed 25 - WIP labels Mar 18, 2021
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - this is very nice indeed!!!! I think the code is much cleaner now. My comments are basically all nitpicks, with perhaps the only one to be thought through being the one about the type of error to be raised when a tuple of outputs has the wrong size.

* only in this step. This function takes the reference objects and
* parses them into the desired values.
* This function cleans up after itself and NULLs references on error,
* the caller has to ensure that out_op[0:nargs] is NULLed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "on input" at the end (at least, that is how I understand it!)

if (out_axis != NULL) {
*out_axis = NULL;
}

if (out_wheremask != NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem we might as well leave this to the caller as well, since we ask for it to initialize the output objects to NULL too.

return -1;
}

PyObject *res = ufunc_generic_call_mps(ufunc, args, kwds, op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name *_mps is not very instructive... Maybe "*_with_operand_pointers"? (not very important, obviously!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to _with_operands, but you are right and I will make a PR after this to just remove the whole thing. Since nobody complained yet, I assume we can go ahead with disabling PyUFunc_GenericFunction for good.

{
if (PyTuple_CheckExact(out_obj)) {
if (PyTuple_GET_SIZE(out_obj) != nout) {
// TODO: Was ValueError (and error order changed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here perhaps ValueError is actually more logical? After all, the type is correct at this point (it being a tuple). Not sure. cc @eric-wieser who introduced this routine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to turn it back into a ValueError. I don't remember if I had more reason to change it (will have to see if the tests need adapting later).

if (npy_parse_arguments("reduceat", args, len_args, kwnames,
"array", NULL, &op,
"indices", NULL, &indices_obj,
"|axis", NULL, &axes_in,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this was here before, but shall we call the variable axis_obj in analogy with all the others? (Feel free to leave it as is, though!)

{
/*
* ufunc.accumulate(a[, axis, dtype, out])
* If the keywords include `sig` rename to `signature`. An error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "An error ..." as this is done in ufuncobject.c

numpy/core/src/umath/override.c Show resolved Hide resolved
@@ -495,104 +240,58 @@ PyUFunc_CheckOverride(PyUFuncObject *ufunc, char *method,
}

/*
* Normalize ufunc arguments.
* Normalize ufunc arguments, note that args does not hold any positional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note was very confusing to me: maybe "note that any input and output arrays in args have already been stored in in_args and out_args"

The comment about len_args seems false (at least for reductions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmpf, this probably made a bit more sense in the old (more complicated) iteration. Adopting your comment.

/*
* Reduce-like methods can pass keyword arguments also by position,
* in which case the additional positional arguments have to be copied
* into the keyword argument dictionary. The __call__ method has to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__call__ and outer methods.

@@ -2637,6 +2273,8 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,

/* Possibly remap axes. */
if (axes != NULL || axis != NULL) {
assert((axes == NULL) || (axis == NULL));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid and small, but perhaps assert(!(axes != NULL && axis != NULL))

@seberg seberg removed the 25 - WIP label Mar 23, 2021
@seberg seberg changed the title WIP, ENH: Optimize and cleanup ufunc calls and ufunc CheckOverrides ENH: Optimize and cleanup ufunc calls and ufunc CheckOverrides Mar 23, 2021
@seberg
Copy link
Member Author

seberg commented Mar 23, 2021

OK, leak checker flushed out two small remaining issues (there are more issues unrelated, I had not run pytest-leaks in too long...). (Note: I rebased on main the first commits are not actually modified.)

From my side, I consider this finished now. I don't think the stricter argument parsing should be problematic.

@seberg seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Mar 23, 2021
@seberg
Copy link
Member Author

seberg commented Mar 23, 2021

Hmmmpf, there is a small "conflict" with gh-18670, in the outer logic. It doesn't matter much which one goes in first, but either will need rebasing to ensure a small leak doesn't reappear.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some nitpicks left, really... Might be good to get another pair of eyes, just in case.

``__array_ufunc__``. Previously, it was possible to pass
on invalid arguments (such as a non-existing keyword
argument) when dispatch was known to occur.
This was always the intended behaviour, if existing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This" is confusing - does it refer to what was the case previously, or what is the case now. Overall, my suggestion would be to just delete the sentence, as I think it is very unlikely someone relied on this behaviour.

``__array_ufunc__`` and additional positional arguments
-------------------------------------------------------
Previously, all positionally passed arguments were used
where used for ``__array_ufunc__``. In the case of reduce,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence not grammatical! I think you want "where used for" -> "to check for"

Since this depended on the way the arguments were passed
(by position or by keyword), NumPy will now only dispatch
on the input and output array.
For example, it will never dispatch on the ``where`` array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "array in a reduction such as np.add.reduce."

doc/release/upcoming_changes/15271.performance.rst Outdated Show resolved Hide resolved
* ufunc() and ufunc.outer() accept 'sig' or 'signature'. We guarantee
* that it is passed as 'signature' by renaming 'sig' if present.
* Note that we have already validated that only one of them was passed
* before checking for checking for overrides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: double "checking" should be "before checking for overrides"

numpy/core/src/umath/override.c Outdated Show resolved Hide resolved
* and an array of (pointers to) PyArrayObjects which are NULL.
*
* 'op' is an array of at least NPY_MAXARGS PyArrayObject *.
*/
NPY_NO_EXPORT int
PyUFunc_GenericFunction(PyUFuncObject *ufunc,
PyObject *args, PyObject *kwds, PyArrayObject **op)
{
/* NumPy 1.19, 2020-01-24 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good!

numpy/core/src/umath/ufunc_object.c Show resolved Hide resolved
seberg and others added 9 commits March 23, 2021 18:10
This is a rather large commit which clean up the ufunc override
code for argument parsing. (I may attempt to split it up)
There are two main things, argument parsing, especially for
reductions is now much faster since `METH_FASTCALL` is used
(when keyword arguments are being used).
By moving the argument parsing and using more generic code this
also simplifies the ufunc override checking especially when
keyword arguments are present.

Both of this decreases the argument parsing overhead quite a lot
for ufunc calls, especially for small reductions. (Without double
checking, I believe the speedup was up to around 30% for small
reductions.)

The downside is some added/annoyance due to the use of marcos to
support both FASTCALL and no FASTCALL semantics.
As a side note: vectorcall is likely not a huge factor for ufuncs
since it is very common not to use any keyword arguments. OTOH,
code that uses ``out=...`` a lot on small arrays should see a nice
difference.
The TypeError TODO is also reflected in the tests I think, the
other TODO seems pretty unnecessary (most users will go through
`tp_vectorcall` anyway, so micro-optimizing that is not all that
relevant).
This fixes the review comments by Marten, mainly small cleanups.
The largest change is that it is now back to a ValueError if the
out tuple has the wrong length.

Co-Authored-By: Marten van Kerkwijk <mhvk@astro.utoronto.ca>
These functions do not require `NPY_NO_EXPORT`. One of them was
(incorrectly) flagged by code coverage. Maybe this helps...
This is based on Marten's review again.
@charris
Copy link
Member

charris commented Mar 23, 2021

If there are two things going on in this PR it might be best to split them.

@seberg
Copy link
Member Author

seberg commented Mar 23, 2021

I had not written release notes for gh-15269 and gh-18642, since they should not have any compatibility issues. But as the PRs do similar things on different part of the code-base I thought it fits together in the release notes.

Other than the release notes thing, this PR is a bit annoyingly large, but that is mostly the nature of the beast, unfortunately. There are some side battles here, but those won't be easy to split out I think.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. To me, this looks all OK!


``__array_ufunc__`` and additional positional arguments
-------------------------------------------------------
Previously, all positionally passed arguments were check for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed a small mistake: check -> checked

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yeah I have to remember to add [skip CI] more often. Seems it didn't pan out, because I pushed before CI was finished I guess.

@mhvk
Copy link
Contributor

mhvk commented Mar 24, 2021

p.s. Minus the last spelling mistake, that is. Note that skipping CI on actions is now possible, by adding [skip CI] to the commit message - see https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/

@charris
Copy link
Member

charris commented Mar 24, 2021

Thanks Sebastian.

@seberg seberg deleted the splitup-faster-argparsing branch March 24, 2021 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants