New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG, SIMD: Fix memory overlap in ufunc comparison loops #22851
Conversation
7a887b6
to
2862a1c
Compare
Isn't the problem that the I guess this is fine as well, the more precise check probably doesn't matter in practice, but should we remove the macros if we don't expand the |
This is confusing with the additional |
Arg, sorry that was off, since the bool version deals with the output type being different. But, I do think something like this would also work: diff --git a/numpy/core/src/umath/fast_loop_macros.h b/numpy/core/src/umath/fast_loop_macros.h
index 5274957f7..973e47850 100644
--- a/numpy/core/src/umath/fast_loop_macros.h
+++ b/numpy/core/src/umath/fast_loop_macros.h
@@ -381,15 +381,25 @@ abs_ptrdiff(char *a, char *b)
#define IS_BLOCKABLE_BINARY_BOOL(esize, vsize) \
(steps[0] == (esize) && steps[0] == steps[1] && steps[2] == (1) && \
npy_is_aligned(args[1], (esize)) && \
- npy_is_aligned(args[0], (esize)))
+ npy_is_aligned(args[0], (esize)) && \
+ (abs_ptrdiff(args[2], args[0]) >= (vsize) || \
+ abs_ptrdiff(args[2], args[0]) == 0) && \
+ (abs_ptrdiff(args[2], args[1]) >= (vsize) || \
+ abs_ptrdiff(args[2], args[1]) >= 0))
#define IS_BLOCKABLE_BINARY_SCALAR1_BOOL(esize, vsize) \
(steps[0] == 0 && steps[1] == (esize) && steps[2] == (1) && \
- npy_is_aligned(args[1], (esize)))
+ npy_is_aligned(args[1], (esize)) && \
+ ((abs_ptrdiff(args[2], args[1]) >= (vsize)) || \
+ (abs_ptrdiff(args[2], args[1]) == 0)) && \
+ abs_ptrdiff(args[2], args[0]) >= (esize))
#define IS_BLOCKABLE_BINARY_SCALAR2_BOOL(esize, vsize) \
(steps[0] == (esize) && steps[1] == 0 && steps[2] == (1) && \
- npy_is_aligned(args[0], (esize)))
+ npy_is_aligned(args[0], (esize)) && \
+ ((abs_ptrdiff(args[2], args[0]) >= (vsize)) || \
+ (abs_ptrdiff(args[2], args[0]) == 0)) && \
+ abs_ptrdiff(args[2], args[1]) >= (esize))
/* align var to alignment */
#define LOOP_BLOCK_ALIGN_VAR(var, type, alignment)\ In any case either seems fine to me if you have a preference for this version. But lets delete the faulty macro's if we go this route? |
I don't like to hide memory overlap tests behind macros, just one check at the beginning of the function and then using contiguous macros or going raw sounds safe and clean to me, we already follow this way for most of our dispatchable sources, therefore I deleted these macros since they are no longer used anywhere. |
Yeah, I can see a point in not hiding them in a macro that is used for other things especially. Lets put this in then, it is indeed a pattern used elsewhere. Thanks for the quick fix and sleuthing in what was wrong! |
uf for uf in UFUNCS if uf.nin == 2 | ||
] | ||
UFUNCS_BINARY_ACC = [ | ||
uf for uf in UFUNCS_BINARY if hasattr(uf, "accumulate") and uf.nout == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. the hasattr
shouldn't do anything right now. Checking here that ufunc.signature is None
would make more sense to me.
But its mainly a nitpick and the rest is important for backporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarification, in that case, the following skip can be avoided:
numpy/numpy/core/tests/test_umath.py
Lines 4356 to 4357 in 1a0ea79
if ufunc.signature: | |
pytest.skip('For generic signatures only') |
closes #22841, relates #21483