New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEP: Deprecate conversion of out-of-bound Python integers #22385
DEP: Deprecate conversion of out-of-bound Python integers #22385
Conversation
Any conversion from a Python integer (or subclass) that is stored into a NumPy dtype but does not fit should raise an error in the future. Note, that casts between NumPy types (or assignments of them) are explicitly not affected by this. There are certain use-cases for allowing such casts, even if technically undefined behavior in C. They just work out well in practice in many cases since e.g. -1 is all 1's in binary represenation (twos complement repr).
This is a temporary solution until such a time where we can have loop specific identities which can handle this more gracefully.
This wraps the fill value into an array, the default fill value for all ointegers is 99999 which doesn't work for many integer dtypes. Note that this might still subtle change the behavior in other code paths where we cannot avoid this. Plus, the deprecationwarning may show up (and in fact be a "in the future will use the default fill value" warning).
df872b4
to
7a951f9
Compare
# TODO: This is probably a mess, but should best preserve behavior? | ||
vals = tuple( | ||
np.array(_recursive_fill_value(dtype[name], f)) | ||
for name in dtype.names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you reach the conclusion that this is correct based on a test failure? I don't see an added test, but the lines are not marked by coverage so I assume they are tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there were a few test failures (mainly in the recarray code that handles a lot of masked arrays with structs).
They go away with this change, which mirrors the non-struct path (that always wraps into array()
...
It took me a while to figure that this seems like the easy way out, I don't particularly love it, but it seems to work. It is a bit unclear if downstream might notice these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
if (ufuncimpl == NULL) { | ||
return NULL; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a request from Numba to expose the ufunc loop used for a dtype. They did not explicitly ask for reductions, but I imagine that will be the next request. Will it be sufficient to expose reducelike_promote_and_resolve
or will we need more logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I have to think abou the API. Could would probably consider a single function/entrypoint. But with reduction=True
as kwarg?
There are some weirder things in how we do reduction promotion and type resolution right now, but I imagine that is fine.
Such new API might also just always live in a NEP 50 future.
I don't think it matters much for this PR. To me the decision here seems mainly whether we prefer this churn, over the churn of having a special (internal for now) API for Python integers. (And maybe the timing of both)
I would prefer this churn, and not have a special API for Python integers.
Hmm, that is kind of worrisome. But doesn't #21875 also have a number of SciPy failures? |
Yes, but they are a fully distinct set of failures. Merging this has no influence on those failures (besides maybe printing slightly different or giving two warnings for the same thing). The failures here are things like |
"NumPy will stop allowing conversion of out-of-bound " | ||
"Python integers to integer arrays. The conversion " | ||
"of %.100R to %S will fail in the future.", | ||
obj, descr) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the astype()
workaround, i.e. np.array(-1, np.uint8)
will raise here, but np.array(-1).astype(np.uint8)
will do the right thing? I am not sure how to express that this deep in the code path...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, also added a release note. If this becomes too painful for downstream, we could consider allowing np.uint8(-1)
making it explicitly work for negative values. (I would still restrict this to the range of the corresponding signed integer, so -128
for uint8
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need that as a fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only a small nit in the release note. Anyone else want to take a look?
Co-authored-by: Matti Picus <matti.picus@gmail.com>
I will update it to remove the special parsing path. They will use the normal parsing paths instead always (might need some test adjustments). It won't be much simpler, but we won't have the "special" path at least. |
After the change in numpy/numpy#22385, numpy raises a deprecation warning with calls such as np.int8(5000) and np.uint32(-1). This change avoids such calls in the tests.
Fixing the SciPy tests wasn't too bad: scipy/scipy#17209 |
After the change in numpy/numpy#22385, numpy raises a deprecation warning with calls such as np.int8(5000) and np.uint32(-1). This change avoids such calls in the tests.
After the change in numpy/numpy#22385, numpy raises a deprecation warning with calls such as np.int8(5000) and np.uint32(-1). This change avoids such calls in the tests.
Any conversion from a Python integer (or subclass) that is stored
into a NumPy dtype but does not fit should raise an error in the future.
Note, that casts between NumPy types (or assignments of them) are
explicitly not affected by this.
A few important notes:
I tested SciPy it has a fair bit of failures that seem managable.
Overall, I am not sure about this, but need the decision whether we do this, or stick with the original gh-21875 proposal. The other PR would be a bit simpler, but I am now not quite convinced that coupling the two is worth the simplification there. My worry is that the MA code may be strange, but downstream might also have similar hard to analyze issues with the change.