Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improve the speed of array conversions using AVX2 if available #21123

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zephyr111
Copy link
Contributor

Hello,

This PR is meant to improve the speed of array conversions using AVX2 if available based on the discussion in #21001 (the original PR has been split in two parts).

@seberg
Copy link
Member

seberg commented Feb 28, 2022

Marking for triage, but generally interested in input: Some unsafe casts (e.g. out-of-bound float to int) are undefined behaviour in C, but in practice seems mostly well defined.
However, some of these "in practice its well defined" breaks due to SIMD explicitly, the issue is that code such as:

np.array(10000000000.).astype("i2")  # note: float array cast to integer

does not always match:

np.array(10000000000.).astype("i8").astype("i2")

Rather, the value is out of bound, and the minimum i2 (16bit integer) is returned for all out of bound values in these instructions.

If we are unsure about changing this, an alternative could be to only do this for safe casts. Or we decide that this should be OK, and add a release note.

}
else {
return &_aligned_cast_@name1@_to_@name2@;
return &_aligned_cast_@name1@_to_@name2@@isa@;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to do a review, but then posted instead:

IIRC this branch is never used (NPY_USE_UNALIGNED_ACCESS is always 0 here) and I don't think vectorization is OK if it was used. So I would either not do this, or just delete the whole # if block: It doesn't really add a whole lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure of the meaning of NPY_USE_UNALIGNED_ACCESS, but if we can remove it here since it is always set to 0, why not removing it from the whole file and possibly the whole code ? It would make the code a bit more clean/readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment of this macro, vectorization should be disabled at a build level. AFAIK, telling the compiler that AVX can be used does not change anything. Using O3 causes GCC to auto-vectorize the code as opposed to O2 so far but the new versions of GCC (starting from GCC 12) should now enable the auto-vectorization even in O2 which is the default optimization level for Numpy so far. Thus, this change should not cause more harm than currently (but I think the code path enabled by NPY_USE_UNALIGNED_ACCESS is certainly arlready harmful). What do you think about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to look closer, but I think we should just delete this code (if you agree with that).

@seberg seberg added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Mar 9, 2022
@seberg
Copy link
Member

seberg commented Mar 9, 2022

We discussed this a bit today in the meeting. And the consensus was that we should probably give this a shot (modulo fixing the tests of course).

However, we probably need to create and prioritize an issue to give floating point warnings reliably, because it seems like +this would give a warning (on most systems). That is independent of this, but this PR gives another good reason.

It also would be good to add a very brief "compatibility" release note for it, just to be on the safe side.

@charris charris closed this Apr 6, 2022
@charris charris reopened this Apr 6, 2022
@seberg
Copy link
Member

seberg commented May 3, 2022

Xref gh-21437 which would add the floating point error checking that might be good to have here.

@seberg
Copy link
Member

seberg commented Jun 14, 2022

@zephyr111 the PR to give floating point warnings/errors that occur during casts is now merged, so this is unblocked.
Would you have time to rebase this and attack the remaining open points? That would be awesome!

@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022
@zephyr111
Copy link
Contributor Author

zephyr111 commented Jul 16, 2022

I am confused about the current state of the runtime checks and what Numpy should do (ie. what we want). Indeed, some bad conversions are now detected which is great but not all of them and I do not understand the logic now how the detection work for some values.

For example, the following code detect most issues but not all on my machine with the main branch of Numpy:

val = np.array([123_456_789_101_112], dtype=np.int64).astype(np.float32)
print('np.int8');	val.astype(np.int8)    # detected overflow
print('np.uint8');	val.astype(np.uint8)   # detected overflow
print('np.int16');	val.astype(np.int16)   # detected overflow
print('np.uint16');	val.astype(np.uint16)  # detected overflow
print('np.int32');	val.astype(np.int32)   # detected overflow
print('np.uint32');	val.astype(np.uint32)  # overflow not detected ! <----------------
print('np.int64');	val.astype(np.int64)   # no overflow
print('np.uint64');	val.astype(np.uint64)  # no overflow

val = np.array([np.nan])
print('np.int8');	val.astype(np.int8)    # detected issue
print('np.uint8');	val.astype(np.uint8)   # detected issue
print('np.int16');	val.astype(np.int16)   # detected issue
print('np.uint16');	val.astype(np.uint16)  # detected issue
print('np.int32');	val.astype(np.int32)   # detected issue
print('np.uint32');	val.astype(np.uint32)  # detected issue
print('np.int64');	val.astype(np.int64)   # detected issue
print('np.uint64');	val.astype(np.uint64)  # detected issue

It looks like a similar issue occurs on the test machine, but with 16-bit integers (I cannot reproduce this on my machine). In fact I cannot get the same outcome on my machine on the failing tests (certainly due to the undefined behavior).

The thing is the conversion function appears not to have been modified so I wonder how the detection is even possible without adding checks in the conversion loop. I do not understand how #21437 succeed to do that.

If we want to detect all cases, AFAIK we need to add a check in the conversion loop but this will certainly be expensive and break the automatic vectorization. Manual vectorization with check is possible but this is a significant work (note that the check is not so straightforward, see this).

Finally, the tests complain about the warning RuntimeWarning: invalid value encountered in cast being raised, but this looks like an intended behavior (since the #21437) and not something we want to hide, isn't it?

@seberg
Copy link
Member

seberg commented Jul 17, 2022

My guess was that we will see either floating point error or "correct" integer overflow/modulo results and no real way to get better than that (unless we add manual checks).

The point being, that if the cast is done as say float64 -> int64 -> uint32 by the compiled code, the float64 -> int64 may not give a warning and then the in64 -> uint32 will never set a floating point warning.

@seberg
Copy link
Member

seberg commented Feb 21, 2023

@seiko2plus just if you are looking for a smaller easy project. Wrapping this up (or something similar), could be very beneficial probably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants