ENH: improve performance of lexsort for integers. #26434

toobaz · 2024-05-14T15:08:22Z

This PR implements an alternative and more performant code path for lexsort when passed (appropriate - see below) integer keys.
It does so by compressing the levels into a single array using bit shifts, and then sorting the resulting array.

The result of this is, at a glance (details below):

~1.75x speedup for large arrays (N>2000)
... but a much large speedup (e.g. 8x in my application) can be obtained thanks to the possibility of selecting the desired sorting algorithm
~0.75x slowdown for small arrays (N<1000)
very small slowdown for the (rare, I think) case where keys are integers but they are too large for the new code path (as reported in the docs, this is guaranteed not to happen for instance if all values are below MAX_INT / 2L, L being the number of levels)

The application I'm referring to is the use of lexsort inside COO sparse matrices in scipy.

The current PR was influenced by some feedback received by @ngoldbaum in the mailing list. However, I did not use C, because the Python code I wrote is vectorized anyway, and writing the same code in C would take me too much time (my C skills being full of rust) resulting in less readable code.

Assuming that, despite this, the approach is of interest for numpy (if this PR is not, I guess I will make an attempt directly in scipy?),

I must obviously solve the problem of where to position the Python code (I don't think the current file is right, since it contains lower level code and cleanly introducing the imports I need would result in a circular import - this is why there currently is a horrible import inside the function): guidance on this will be highly appreciated
I must also understand why at the end of the function I need to make axis explicit, but this is related to the previous point I think
I must ~~write tests~~ integrate the tests I wrote into the codebase

Few other things to note:

the approach is similar to the one I took in the MultiIndex engine for pandas
a previous version of this PR (before feedback from @ngoldbaum ) would allow the user to switch on or off the new code path, but indeed the checks required to establish if it is feasible add a negligible overhead
that same version would also switch to Python int if the keys values were too big to fit inside uint... but that brings a strong performance penalty over the old code path, so it makes no sense if the user doesn't have the choice
I think that it would be easy to patch the C code as to support different sorting algorithms... at least easy for someone a bit more fluent in C, definitely not for me. If such a patch is introduced, this PR will not guarantee the 8x speedup on nearly-sorted arrays, just the ~1.75x speedup.

Below are the result of the performance tests, where

"PR" is this branch, "base" is main (previous commit)
"mix" refers to random arrays, "sep" to arrays where two sorted (2-level) arrays are concatenated (so on which stable sort shines, because it has linear cost)
"sta" refers to tests with kind=stable (in the PR)
"big" refers to arrays where values are so large that the new code path cannot be taken (and a minimum of overhead is wasted in order to determine this is the case)

Absolute scale:

Log scale:

Estimation of speedup ratio when the new code path is feasible:

... and when it is not (note that it is never taken for N < 1000, so the loss in efficiency is overhead):

The code used for these tests is available here.

toobaz · 2024-05-14T15:25:38Z

I'm confused... the failing test is

         v = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
         assert_equal(np.lexsort(v), 0)

at https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_regression.py#L410

... but the docs state Each element of `keys` along the zeroth axis must be an array-like object of the same shape. So why are we checking the result - rather than expecting an error - in a case where keys is 1D (that is, elements of keys are scalar)?!

ngoldbaum · 2024-05-14T15:31:02Z

Scalars are a kind of array-like. My reading of the docs is this case is allowed, with N=0. The test being triggered is 18 years old so this is behavior that's been stable for the entire life of the library it looks like.

toobaz · 2024-05-15T05:35:42Z

(I see the failing test is also failing in #26436 , so I guess it's unrelated to this PR...)

mattip · 2024-05-15T19:24:07Z

numpy/_core/multiarray.py

+        # Check if we are only dealing with integers:
+        if all([isinstance(a, np.ndarray) and
+                np.issubdtype(a.dtype, np.integer)
+                for a in keys]):


Is there a tab here? The formatting is strange

@mattip Outdated, but for the records: no, I just had aligned the content of the square brackets... something similar happens in the new version at https://github.com/numpy/numpy/pull/26434/files#diff-33968b6e8acadeaac162e52ce132395af75712779b23ad0e3cb1600fb36edcc6R659

ngoldbaum · 2024-05-15T19:24:43Z

numpy/_core/multiarray.py

@@ -434,11 +434,41 @@ def where(condition, x=None, y=None):


 @array_function_from_c_func_and_dispatcher(_multiarray_umath.lexsort)


you should be using the __array_function__ dispatcher for python functions, and rename this function to be lexsort

ngoldbaum · 2024-05-15T19:33:53Z

We looked at this at the triage meeting. Overall we think this is a neat idea but there are some issues.

First, can you add a one or two line comment above the first check that dispatches to the fast integer path explaining exactly what the fast path is doing. It would also be nice if you could link to an explanation of how the compression works, particularly if you can find any discussions about doing that for a sorting algorithm.

Do we really need the new kind argument? It's not great IMO to add a keyword argument which only applies based on particular properties of the input data.

We also found the coding style to be a little bit opaque. Can you try to break up some of the longer lines that combine many operations to make the algorithm a little clearer?

This is sort of on the bubble for the sort of contribution we'd take that improves performance. We have to balance maintainability too. Improvements that make this code easier to understand would tilt the balance towards merging this.

ngoldbaum · 2024-05-15T19:36:48Z

Oh also it would be nice if you could add a new benchmark for lexsort to the existing benchmark suite and show that it improves performance for the case you're looking for and also that the performance hit is negligible for the un-optimized case. Ping here if you have trouble with the benchmark suite, happy to answer questions.

toobaz · 2024-05-21T09:06:52Z

Again thanks for your great feedback @ngoldbaum

I will soon edit my PR accordingly, but first let me take a step back:

Do we really need the new kind argument? It's not great IMO to add a keyword argument which only applies based on particular properties of the input data.

Indeed, we do not really need it in this PR (at least for my purposes, since the default "stable" is precisely what I need to get the huge efficiency gain!).

This said, I decided to take a look at whether in fact supporting kind in the current code path was maybe a low hanging fruit.

Turned out it isn't (at least not low enough for my awful C skills). My understanding is that lexsort basically calls a normal mergesort on a rearranged version of the input keys. So I tried to just replace mergesort with the other kinds there... and I failed.

The attempt is here: main...toobaz:numpy:lexsort_kind

... but it fails brutally, already with
np.lexsort([[2, 7], [0, 0]], kind='h') which returns [1, 0]
(while np.lexsort([[2, 7], [0, 0]]), np.argsort([2, 7], kind='h') and np.lexsort([[2, 7], [0, 0]], kind='s') all correctly return [0, 1]).

If there's any chance someone good at C can maybe identify some stupid mistake I made, so that we get kind supported in general, great. Otherwise, if either the problem is deeper, or anyway nobody has time to check my (other) code, I will just remove support for kind from this PR.

toobaz · 2024-06-03T15:51:37Z

OK, never mind (at least in this PR) the kind argument.

This new version drops it, makes code way more readable (I hope), and adds tests and benchmarks, here's the result:

| Change   | Before [76807f0e] <main>   | After [94dea95c] <int_lexsort_kind>   |   Ratio | Benchmark (Parameter)                                                      |
|----------|----------------------------|---------------------------------------|---------|----------------------------------------------------------------------------|
| +        | 1.70±0.02μs                | 2.07±0.02μs                           |    1.22 | bench_function_base.LexSort.time_lexsort('int64', 2, 100, 'mixed')         |
| +        | 1.81±0.03μs                | 2.20±0.02μs                           |    1.22 | bench_function_base.LexSort.time_lexsort('int64', 2, 100, 'separated')     |
| +        | 1.74±0.02μs                | 2.10±0.02μs                           |    1.21 | bench_function_base.LexSort.time_lexsort('int64', 2, 100, 'big')           |
| +        | 3.06±0.01μs                | 3.51±0.02μs                           |    1.15 | bench_function_base.LexSort.time_lexsort('int64', 4, 100, 'separated')     |
| +        | 2.38±0.08μs                | 2.71±0.01μs                           |    1.14 | bench_function_base.LexSort.time_lexsort('float64', 2, 100, 'mixed')       |
| +        | 2.50±0.06μs                | 2.85±0.01μs                           |    1.14 | bench_function_base.LexSort.time_lexsort('float64', 2, 100, 'separated')   |
| +        | 2.93±0.03μs                | 3.34±0.02μs                           |    1.14 | bench_function_base.LexSort.time_lexsort('int64', 4, 100, 'mixed')         |
| +        | 2.46±0.08μs                | 2.79±0.01μs                           |    1.13 | bench_function_base.LexSort.time_lexsort('float64', 2, 100, 'big')         |
| +        | 2.99±0.03μs                | 3.38±0.02μs                           |    1.13 | bench_function_base.LexSort.time_lexsort('int64', 4, 100, 'big')           |
| +        | 4.11±0.03μs                | 4.61±0.04μs                           |    1.12 | bench_function_base.LexSort.time_lexsort('float64', 4, 100, 'mixed')       |
| +        | 4.20±0.1μs                 | 4.67±0.03μs                           |    1.11 | bench_function_base.LexSort.time_lexsort('float64', 4, 100, 'big')         |
| -        | 12.6±0.03ms                | 7.12±0.06ms                           |    0.56 | bench_function_base.LexSort.time_lexsort('int64', 2, 100000, 'mixed')      |
| -        | 905±2μs                    | 490±5μs                               |    0.54 | bench_function_base.LexSort.time_lexsort('int64', 2, 10000, 'mixed')       |
| -        | 173±1ms                    | 92.8±0.8ms                            |    0.54 | bench_function_base.LexSort.time_lexsort('int64', 2, 1000000, 'mixed')     |
| -        | 171±2ms                    | 18.0±0.2ms                            |    0.11 | bench_function_base.LexSort.time_lexsort('int64', 2, 1000000, 'separated') |
| -        | 907±3μs                    | 85.7±1μs                              |    0.09 | bench_function_base.LexSort.time_lexsort('int64', 2, 10000, 'separated')   |
| -        | 12.7±0.1ms                 | 962±30μs                              |    0.08 | bench_function_base.LexSort.time_lexsort('int64', 2, 100000, 'separated')  |

As expected, small overhead costs on small arrays, large gains on large arrays.

@ngoldbaum I'm pretty sure that the code is still in the wrong file (forcing me to import numpy as np within the function) and that (probably relatedly) I'm misusing the "dispatcher"; but I don't understand your suggestion to rename _general_lexsort to lexsort - the latter is the name of the API method, which is now a Python method without corresponding C method. Should I rename the C method instead?

This said, there are two types of errors in the tests.

(some?) 32 bit platforms give wrong results, and I will investigate this as soon as I can find a test machine (I guess a Raspberry Pi will do)
the warnings.simplefilter("ignore", RuntimeWarning) at line 610 breaks some assertion... and on this I need help, because I really don't understand the error. It seems to refer to a warning being raised, but in my case a warning is being ignored!

[EDIT:] an example of the error I don't understand: https://github.com/numpy/numpy/actions/runs/9351623692/job/25738229144?pr=26434

github-actions bot added the 01 - Enhancement label May 14, 2024

toobaz force-pushed the int_lexsort_kind branch 3 times, most recently from 9fb59dd to 0ca9b19 Compare May 14, 2024 23:40

mattip reviewed May 15, 2024

View reviewed changes

ngoldbaum reviewed May 15, 2024

View reviewed changes

toobaz force-pushed the int_lexsort_kind branch 5 times, most recently from f104ce3 to 60e5b39 Compare June 3, 2024 12:59

ENH: improve performance of lexsort for integers.

861a273

toobaz force-pushed the int_lexsort_kind branch from 60e5b39 to 861a273 Compare June 3, 2024 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: improve performance of lexsort for integers. #26434

ENH: improve performance of lexsort for integers. #26434

toobaz commented May 14, 2024 •

edited

toobaz commented May 14, 2024 •

edited

ngoldbaum commented May 14, 2024

toobaz commented May 15, 2024 •

edited

mattip May 15, 2024

toobaz Jun 3, 2024

ngoldbaum May 15, 2024

ngoldbaum commented May 15, 2024

ngoldbaum commented May 15, 2024

toobaz commented May 21, 2024

toobaz commented Jun 3, 2024 •

edited

		@@ -434,11 +434,41 @@ def where(condition, x=None, y=None):


		@array_function_from_c_func_and_dispatcher(_multiarray_umath.lexsort)

ENH: improve performance of lexsort for integers. #26434

Are you sure you want to change the base?

ENH: improve performance of lexsort for integers. #26434

Conversation

toobaz commented May 14, 2024 • edited

toobaz commented May 14, 2024 • edited

ngoldbaum commented May 14, 2024

toobaz commented May 15, 2024 • edited

mattip May 15, 2024

Choose a reason for hiding this comment

toobaz Jun 3, 2024

Choose a reason for hiding this comment

ngoldbaum May 15, 2024

Choose a reason for hiding this comment

ngoldbaum commented May 15, 2024

ngoldbaum commented May 15, 2024

toobaz commented May 21, 2024

toobaz commented Jun 3, 2024 • edited

toobaz commented May 14, 2024 •

edited

toobaz commented May 14, 2024 •

edited

toobaz commented May 15, 2024 •

edited

toobaz commented Jun 3, 2024 •

edited