-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: improve performance of lexsort for integers. #26434
base: main
Are you sure you want to change the base?
Conversation
I'm confused... the failing test is
at https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_regression.py#L410 ... but the docs state |
Scalars are a kind of array-like. My reading of the docs is this case is allowed, with |
9fb59dd
to
0ca9b19
Compare
(I see the failing test is also failing in #26436 , so I guess it's unrelated to this PR...) |
numpy/_core/multiarray.py
Outdated
# Check if we are only dealing with integers: | ||
if all([isinstance(a, np.ndarray) and | ||
np.issubdtype(a.dtype, np.integer) | ||
for a in keys]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a tab here? The formatting is strange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip Outdated, but for the records: no, I just had aligned the content of the square brackets... something similar happens in the new version at https://github.com/numpy/numpy/pull/26434/files#diff-33968b6e8acadeaac162e52ce132395af75712779b23ad0e3cb1600fb36edcc6R659
@@ -434,11 +434,41 @@ def where(condition, x=None, y=None): | |||
|
|||
|
|||
@array_function_from_c_func_and_dispatcher(_multiarray_umath.lexsort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be using the __array_function__
dispatcher for python functions, and rename this function to be lexsort
We looked at this at the triage meeting. Overall we think this is a neat idea but there are some issues. First, can you add a one or two line comment above the first check that dispatches to the fast integer path explaining exactly what the fast path is doing. It would also be nice if you could link to an explanation of how the compression works, particularly if you can find any discussions about doing that for a sorting algorithm. Do we really need the new We also found the coding style to be a little bit opaque. Can you try to break up some of the longer lines that combine many operations to make the algorithm a little clearer? This is sort of on the bubble for the sort of contribution we'd take that improves performance. We have to balance maintainability too. Improvements that make this code easier to understand would tilt the balance towards merging this. |
Oh also it would be nice if you could add a new benchmark for |
Again thanks for your great feedback @ngoldbaum I will soon edit my PR accordingly, but first let me take a step back:
Indeed, we do not really need it in this PR (at least for my purposes, since the default "stable" is precisely what I need to get the huge efficiency gain!). This said, I decided to take a look at whether in fact supporting Turned out it isn't (at least not low enough for my awful C skills). My understanding is that The attempt is here: main...toobaz:numpy:lexsort_kind ... but it fails brutally, already with If there's any chance someone good at C can maybe identify some stupid mistake I made, so that we get |
f104ce3
to
60e5b39
Compare
OK, never mind (at least in this PR) the This new version drops it, makes code way more readable (I hope), and adds tests and benchmarks, here's the result:
As expected, small overhead costs on small arrays, large gains on large arrays. @ngoldbaum I'm pretty sure that the code is still in the wrong file (forcing me to This said, there are two types of errors in the tests.
[EDIT:] an example of the error I don't understand: https://github.com/numpy/numpy/actions/runs/9351623692/job/25738229144?pr=26434 |
This PR implements an alternative and more performant code path for
lexsort
when passed (appropriate - see below) integerkeys
.It does so by compressing the levels into a single array using bit shifts, and then sorting the resulting array.
The result of this is, at a glance (details below):
keys
are integers but they are too large for the new code path (as reported in the docs, this is guaranteed not to happen for instance if all values are below MAX_INT / 2L, L being the number of levels)The application I'm referring to is the use of
lexsort
inside COO sparse matrices in scipy.The current PR was influenced by some feedback received by @ngoldbaum in the mailing list. However, I did not use C, because the Python code I wrote is vectorized anyway, and writing the same code in C would take me too much time (my C skills being full of rust) resulting in less readable code.
Assuming that, despite this, the approach is of interest for numpy (if this PR is not, I guess I will make an attempt directly in scipy?),
axis
explicit, but this is related to the previous point I thinkwrite testsintegrate the tests I wrote into the codebaseFew other things to note:
MultiIndex
engine forpandas
int
if thekeys
values were too big to fit insideuint
... but that brings a strong performance penalty over the old code path, so it makes no sense if the user doesn't have the choiceBelow are the result of the performance tests, where
stable
sort shines, because it has linear cost)kind=stable
(in the PR)Absolute scale:
Log scale:
Estimation of speedup ratio when the new code path is feasible:
... and when it is not (note that it is never taken for N < 1000, so the loss in efficiency is overhead):
The code used for these tests is available here.