ENH: improve the speed of array conversions using AVX2 if available #21123

zephyr111 · 2022-02-26T16:39:05Z

Hello,

This PR is meant to improve the speed of array conversions using AVX2 if available based on the discussion in #21001 (the original PR has been split in two parts).

seberg · 2022-02-28T21:06:25Z

Marking for triage, but generally interested in input: Some unsafe casts (e.g. out-of-bound float to int) are undefined behaviour in C, but in practice seems mostly well defined.
However, some of these "in practice its well defined" breaks due to SIMD explicitly, the issue is that code such as:

np.array(10000000000.).astype("i2")  # note: float array cast to integer

does not always match:

np.array(10000000000.).astype("i8").astype("i2")

Rather, the value is out of bound, and the minimum i2 (16bit integer) is returned for all out of bound values in these instructions.

If we are unsure about changing this, an alternative could be to only do this for safe casts. Or we decide that this should be OK, and add a release note.

seberg · 2022-02-28T21:07:59Z

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

                    }
                    else {
-                        return &_aligned_cast_@name1@_to_@name2@;
+                        return &_aligned_cast_@name1@_to_@name2@@isa@;


Meant to do a review, but then posted instead:

IIRC this branch is never used (NPY_USE_UNALIGNED_ACCESS is always 0 here) and I don't think vectorization is OK if it was used. So I would either not do this, or just delete the whole # if block: It doesn't really add a whole lot.

I am unsure of the meaning of NPY_USE_UNALIGNED_ACCESS, but if we can remove it here since it is always set to 0, why not removing it from the whole file and possibly the whole code ? It would make the code a bit more clean/readable.

Based on the comment of this macro, vectorization should be disabled at a build level. AFAIK, telling the compiler that AVX can be used does not change anything. Using O3 causes GCC to auto-vectorize the code as opposed to O2 so far but the new versions of GCC (starting from GCC 12) should now enable the auto-vectorization even in O2 which is the default optimization level for Numpy so far. Thus, this change should not cause more harm than currently (but I think the code path enabled by NPY_USE_UNALIGNED_ACCESS is certainly arlready harmful). What do you think about that?

I have to look closer, but I think we should just delete this code (if you agree with that).

seberg · 2022-03-09T18:08:30Z

We discussed this a bit today in the meeting. And the consensus was that we should probably give this a shot (modulo fixing the tests of course).

However, we probably need to create and prioritize an issue to give floating point warnings reliably, because it seems like +this would give a warning (on most systems). That is independent of this, but this PR gives another good reason.

It also would be good to add a very brief "compatibility" release note for it, just to be on the safe side.

seberg · 2022-05-03T22:02:36Z

Xref gh-21437 which would add the floating point error checking that might be good to have here.

seberg · 2022-06-14T18:59:04Z

@zephyr111 the PR to give floating point warnings/errors that occur during casts is now merged, so this is unblocked.
Would you have time to rebase this and attack the remaining open points? That would be awesome!

zephyr111 · 2022-07-16T15:08:35Z

I am confused about the current state of the runtime checks and what Numpy should do (ie. what we want). Indeed, some bad conversions are now detected which is great but not all of them and I do not understand the logic now how the detection work for some values.

For example, the following code detect most issues but not all on my machine with the main branch of Numpy:

val = np.array([123_456_789_101_112], dtype=np.int64).astype(np.float32)
print('np.int8');	val.astype(np.int8)    # detected overflow
print('np.uint8');	val.astype(np.uint8)   # detected overflow
print('np.int16');	val.astype(np.int16)   # detected overflow
print('np.uint16');	val.astype(np.uint16)  # detected overflow
print('np.int32');	val.astype(np.int32)   # detected overflow
print('np.uint32');	val.astype(np.uint32)  # overflow not detected ! <----------------
print('np.int64');	val.astype(np.int64)   # no overflow
print('np.uint64');	val.astype(np.uint64)  # no overflow

val = np.array([np.nan])
print('np.int8');	val.astype(np.int8)    # detected issue
print('np.uint8');	val.astype(np.uint8)   # detected issue
print('np.int16');	val.astype(np.int16)   # detected issue
print('np.uint16');	val.astype(np.uint16)  # detected issue
print('np.int32');	val.astype(np.int32)   # detected issue
print('np.uint32');	val.astype(np.uint32)  # detected issue
print('np.int64');	val.astype(np.int64)   # detected issue
print('np.uint64');	val.astype(np.uint64)  # detected issue

It looks like a similar issue occurs on the test machine, but with 16-bit integers (I cannot reproduce this on my machine). In fact I cannot get the same outcome on my machine on the failing tests (certainly due to the undefined behavior).

The thing is the conversion function appears not to have been modified so I wonder how the detection is even possible without adding checks in the conversion loop. I do not understand how #21437 succeed to do that.

If we want to detect all cases, AFAIK we need to add a check in the conversion loop but this will certainly be expensive and break the automatic vectorization. Manual vectorization with check is possible but this is a significant work (note that the check is not so straightforward, see this).

Finally, the tests complain about the warning RuntimeWarning: invalid value encountered in cast being raised, but this looks like an intended behavior (since the #21437) and not something we want to hide, isn't it?

seberg · 2022-07-17T15:06:35Z

My guess was that we will see either floating point error or "correct" integer overflow/modulo results and no real way to get better than that (unless we add manual checks).

The point being, that if the cast is done as say float64 -> int64 -> uint32 by the compiled code, the float64 -> int64 may not give a warning and then the in64 -> uint32 will never set a floating point warning.

seberg · 2023-02-21T08:32:28Z

@seiko2plus just if you are looking for a smaller easy project. Wrapping this up (or something similar), could be very beneficial probably.

github-actions bot added the 01 - Enhancement label Feb 26, 2022

zephyr111 force-pushed the faster-casts branch from e79f312 to 872438a Compare February 26, 2022 16:51

zephyr111 mentioned this pull request Feb 26, 2022

ENH: help compilers to auto-vectorize reduction operators #21001

Merged

seberg added the triage review Issue/PR to be discussed at the next triage meeting label Feb 28, 2022

seberg reviewed Feb 28, 2022

View reviewed changes

zephyr111 mentioned this pull request Mar 7, 2022

BUG: Conversion of numpy.nan to int gives inconsistent results #21166

Closed

seberg added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Mar 9, 2022

charris closed this Apr 6, 2022

charris reopened this Apr 6, 2022

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022

ENH: improve the speed of array conversions using AVX2 if available

01a3db0

zephyr111 force-pushed the faster-casts branch from 872438a to 01a3db0 Compare July 15, 2022 20:47

DOC: fix a syntax error in the added changelog

2e52b6d

FirefoxMetzger mentioned this pull request Dec 21, 2022

DOC: (Undefined) behavior of float to integer casts only really explained in release notes #22852

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: improve the speed of array conversions using AVX2 if available #21123

ENH: improve the speed of array conversions using AVX2 if available #21123

zephyr111 commented Feb 26, 2022

seberg commented Feb 28, 2022

seberg Feb 28, 2022

zephyr111 Jul 15, 2022

zephyr111 Jul 15, 2022

seberg Jul 15, 2022

seberg commented Mar 9, 2022 •

edited

seberg commented May 3, 2022

seberg commented Jun 14, 2022

zephyr111 commented Jul 16, 2022 •

edited

seberg commented Jul 17, 2022

seberg commented Feb 21, 2023

ENH: improve the speed of array conversions using AVX2 if available #21123

Are you sure you want to change the base?

ENH: improve the speed of array conversions using AVX2 if available #21123

Conversation

zephyr111 commented Feb 26, 2022

seberg commented Feb 28, 2022

seberg Feb 28, 2022

Choose a reason for hiding this comment

zephyr111 Jul 15, 2022

Choose a reason for hiding this comment

zephyr111 Jul 15, 2022

Choose a reason for hiding this comment

seberg Jul 15, 2022

Choose a reason for hiding this comment

seberg commented Mar 9, 2022 • edited

seberg commented May 3, 2022

seberg commented Jun 14, 2022

zephyr111 commented Jul 16, 2022 • edited

seberg commented Jul 17, 2022

seberg commented Feb 21, 2023

seberg commented Mar 9, 2022 •

edited

zephyr111 commented Jul 16, 2022 •

edited