Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG, SIMD: Fix memory overlap in ufunc comparison loops #22851

Merged
merged 2 commits into from Dec 22, 2022

Conversation

seiko2plus
Copy link
Member

closes #22841, relates #21483

@seiko2plus seiko2plus added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Dec 21, 2022
@seiko2plus seiko2plus marked this pull request as ready for review December 21, 2022 10:12
@seiko2plus seiko2plus changed the title BUG, SIMD: Fix memory overlap on comparison accumulate BUG, SIMD: Fix memory overlap in comparison ufunc loops Dec 21, 2022
@seiko2plus seiko2plus changed the title BUG, SIMD: Fix memory overlap in comparison ufunc loops BUG, SIMD: Fix memory overlap in ufunc comparison loops Dec 21, 2022
@seberg
Copy link
Member

seberg commented Dec 21, 2022

Isn't the problem that the IS_BLOCKABLE_BINARY_BOOL check is missing the (abs_ptrdiff(args[2], args[1]) >= (vsize) and similar checks that IS_BLOCKABLE_BINARY_SCALAR1 and IS_BLOCKABLE_BINARY_SCALAR2 have?

I guess this is fine as well, the more precise check probably doesn't matter in practice, but should we remove the macros if we don't expand the IS_BLOCKABLE_BINARY_BOOL to include the vsize based check?

@seberg
Copy link
Member

seberg commented Dec 21, 2022

This is confusing with the additional _BOOL versions! How about removing the _BOOL macros and using the normal macros instead? Maybe I am missing something, but it seems to me that should work?

@seberg
Copy link
Member

seberg commented Dec 21, 2022

Arg, sorry that was off, since the bool version deals with the output type being different. But, I do think something like this would also work:

diff --git a/numpy/core/src/umath/fast_loop_macros.h b/numpy/core/src/umath/fast_loop_macros.h
index 5274957f7..973e47850 100644
--- a/numpy/core/src/umath/fast_loop_macros.h
+++ b/numpy/core/src/umath/fast_loop_macros.h
@@ -381,15 +381,25 @@ abs_ptrdiff(char *a, char *b)
 #define IS_BLOCKABLE_BINARY_BOOL(esize, vsize) \
     (steps[0] == (esize) && steps[0] == steps[1] && steps[2] == (1) && \
      npy_is_aligned(args[1], (esize)) && \
-     npy_is_aligned(args[0], (esize)))
+     npy_is_aligned(args[0], (esize)) && \
+     (abs_ptrdiff(args[2], args[0]) >= (vsize) || \
+      abs_ptrdiff(args[2], args[0]) == 0) && \
+     (abs_ptrdiff(args[2], args[1]) >= (vsize) || \
+      abs_ptrdiff(args[2], args[1]) >= 0))
 
 #define IS_BLOCKABLE_BINARY_SCALAR1_BOOL(esize, vsize) \
     (steps[0] == 0 && steps[1] == (esize) && steps[2] == (1) && \
-     npy_is_aligned(args[1], (esize)))
+     npy_is_aligned(args[1], (esize)) && \
+     ((abs_ptrdiff(args[2], args[1]) >= (vsize)) || \
+      (abs_ptrdiff(args[2], args[1]) == 0)) && \
+     abs_ptrdiff(args[2], args[0]) >= (esize))
 
 #define IS_BLOCKABLE_BINARY_SCALAR2_BOOL(esize, vsize) \
     (steps[0] == (esize) && steps[1] == 0 && steps[2] == (1) && \
-     npy_is_aligned(args[0], (esize)))
+     npy_is_aligned(args[0], (esize)) && \
+     ((abs_ptrdiff(args[2], args[0]) >= (vsize)) || \
+      (abs_ptrdiff(args[2], args[0]) == 0)) && \
+     abs_ptrdiff(args[2], args[1]) >= (esize))
 
 /* align var to alignment */
 #define LOOP_BLOCK_ALIGN_VAR(var, type, alignment)\

In any case either seems fine to me if you have a preference for this version. But lets delete the faulty macro's if we go this route?

@seiko2plus
Copy link
Member Author

seiko2plus commented Dec 22, 2022

I don't like to hide memory overlap tests behind macros, just one check at the beginning of the function and then using contiguous macros or going raw sounds safe and clean to me, we already follow this way for most of our dispatchable sources, therefore I deleted these macros since they are no longer used anywhere.

@seberg
Copy link
Member

seberg commented Dec 22, 2022

Yeah, I can see a point in not hiding them in a macro that is used for other things especially. Lets put this in then, it is indeed a pattern used elsewhere.

Thanks for the quick fix and sleuthing in what was wrong!

@seberg seberg merged commit ae81e65 into numpy:main Dec 22, 2022
uf for uf in UFUNCS if uf.nin == 2
]
UFUNCS_BINARY_ACC = [
uf for uf in UFUNCS_BINARY if hasattr(uf, "accumulate") and uf.nout == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. the hasattr shouldn't do anything right now. Checking here that ufunc.signature is None would make more sense to me.
But its mainly a nitpick and the rest is important for backporting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification, in that case, the following skip can be avoided:

if ufunc.signature:
pytest.skip('For generic signatures only')

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.logical_xor.accumulate fails on 1.24 on mac
3 participants