ENH: add scalar special cases for boolean logical loops #8924

juliantaylor · 2017-04-11T13:24:44Z

Scalar logical loops just boil down to memcpy or memset. While not very
useful in general, some cases like masked arrays can profit when
creating zero stride boolean arrays for readonly views.

juliantaylor · 2017-04-11T13:25:04Z

see gh-8910 for a masked array change that makes use of these loops.

eric-wieser · 2017-04-11T13:30:57Z

How do timings compare for cases that do and don't take advantage of this?

juliantaylor · 2017-04-11T13:35:30Z

it should easily be a factor 8-16, a SSE unit can perform 16 boolean ops at once.

eric-wieser · 2017-04-11T13:36:46Z

I'm more concerned about the impact of the branching for small loop sizes when we aren't in the special case this is optimizing for

What determines the number of elements in the inner loop? Does the ufunc always pick a contiguous dimension, or the longest one, or what?

juliantaylor · 2017-04-11T13:39:45Z

The ufunc machinery overhead is an order of magnitude higher than the cost of the inner loops. The code inside them only starts mattering when the arrays become larger than a few thousand elements.

The inner loop size is the full array if it can be expressed via the strides without buffering.
With buffering (casting, broadcasting, reductions) the inner loop size is 8192 contiguous elements (np.setbufsize)

eric-wieser · 2017-04-12T00:27:52Z

This helps, but it's still slower than working with the full arrays (on my MSVC build, anyway).

Setup:

In [1]: b = np.ma.make_mask_none((100, 100), writeable=False)
In [2]: f = np.ma.make_mask_none((100, 100), writeable=True)

Before:

In [3]: %timeit b | b
100000 loops, best of 3: 12.7 µs per loop

In [4]: %timeit b | f
100000 loops, best of 3: 12.5 µs per loop

In [5]: %timeit f | f
1000000 loops, best of 3: 1.86 µs per loop

After

In [3]: %timeit b | b
100000 loops, best of 3: 8.83 µs per loop

In [4]: %timeit b | f
100000 loops, best of 3: 2.59 µs per loop

In [5]: %timeit f | f
1000000 loops, best of 3: 1.82 µs per loop

juliantaylor · 2017-04-12T10:28:25Z

I didn't add code for scalar | scalar as that probably doesn't happen in practice, the masked code usually checks for double nomask before doing a masked operation.
But I added it now and cleaned up the code a bit.

That it is still slower than full array is actually due to the iterator. In the full array case it uses a fastpath to skip the nditer setup. Apparently this does not trigger when a zero-d array is involved, so it sets up the expensive iterator.
It should still be good enough, the mask operation cost is only a fraction of the full masked array operation.

eric-wieser · 2017-04-17T16:26:18Z

numpy/core/src/umath/loops.c.src

+         */
+        if (steps[0] == 0) {
+            if (steps[1] == 0) {
+                BOOL_SCALAR_OP(1, 1, ip2, (*args[0] @OP@ *args[1]));


I think we might want const bool val = *args[0] @OP@ *args[1], so that it doesn't get calculated repeatedly inside the loop?

shouldn't be a relevant code path, but why not.

Scalar logical loops just boil down to memcpy or memset. While not very useful in general, some cases like masked arrays can profit when creating zero stride boolean arrays for readonly views.

charris · 2021-01-25T18:48:52Z

Does the SIMD work supercede this? @Qiyu8 @seiko2plus Thoughts?

Qiyu8 · 2021-01-26T08:04:29Z

IMO, A memcpy&memset method is provided to replace SIMD/uloop method for special cases such as readonly masked arrays, but I don't see enough evidence to proves the benefits here. if It's shows better performance than SIMD/uloop, then we should accept it. OTOH, this PR reminds me that #16960 is extending current sse2 functions to universal intrinsics, we can have a bench test after #16960 is merged.

charris · 2021-04-21T22:54:32Z

@juliantaylor @eric-wieser I'm inclined to close this, thoughts?

charris · 2023-02-20T20:28:45Z

I'm going to close this. Thanks Julian.

charris added 01 - Enhancement component: numpy._core labels Apr 11, 2017

This was referenced Apr 11, 2017

ENH: avoid allocations in getmaskarray #8910

Closed

BENCH: Masked array benchmarks #8929

Merged

juliantaylor force-pushed the scalar-bool branch 2 times, most recently from dd8f47e to 53f9d22 Compare April 12, 2017 10:24

juliantaylor force-pushed the scalar-bool branch from 53f9d22 to e1b3653 Compare April 12, 2017 10:30

eric-wieser reviewed Apr 17, 2017

View reviewed changes

ENH: add scalar special cases for boolean logical loops

d041977

Scalar logical loops just boil down to memcpy or memset. While not very useful in general, some cases like masked arrays can profit when creating zero stride boolean arrays for readonly views.

juliantaylor force-pushed the scalar-bool branch from e1b3653 to d041977 Compare April 18, 2017 11:10

juliantaylor mentioned this pull request Apr 21, 2017

ENH: add np.positive ufunc #8967

Merged

Base automatically changed from master to main March 4, 2021 02:03

charris closed this Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add scalar special cases for boolean logical loops #8924

ENH: add scalar special cases for boolean logical loops #8924

juliantaylor commented Apr 11, 2017

juliantaylor commented Apr 11, 2017

eric-wieser commented Apr 11, 2017

juliantaylor commented Apr 11, 2017

eric-wieser commented Apr 11, 2017 •

edited

juliantaylor commented Apr 11, 2017 •

edited

eric-wieser commented Apr 12, 2017 •

edited

juliantaylor commented Apr 12, 2017

eric-wieser Apr 17, 2017

juliantaylor Apr 18, 2017

charris commented Jan 25, 2021

Qiyu8 commented Jan 26, 2021 •

edited

charris commented Apr 21, 2021

charris commented Feb 20, 2023

ENH: add scalar special cases for boolean logical loops #8924

ENH: add scalar special cases for boolean logical loops #8924

Conversation

juliantaylor commented Apr 11, 2017

juliantaylor commented Apr 11, 2017

eric-wieser commented Apr 11, 2017

juliantaylor commented Apr 11, 2017

eric-wieser commented Apr 11, 2017 • edited

juliantaylor commented Apr 11, 2017 • edited

eric-wieser commented Apr 12, 2017 • edited

juliantaylor commented Apr 12, 2017

eric-wieser Apr 17, 2017

Choose a reason for hiding this comment

juliantaylor Apr 18, 2017

Choose a reason for hiding this comment

charris commented Jan 25, 2021

Qiyu8 commented Jan 26, 2021 • edited

charris commented Apr 21, 2021

charris commented Feb 20, 2023

eric-wieser commented Apr 11, 2017 •

edited

juliantaylor commented Apr 11, 2017 •

edited

eric-wieser commented Apr 12, 2017 •

edited

Qiyu8 commented Jan 26, 2021 •

edited