Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP: Deprecate conversion of out-of-bound Python integers #22385

Merged
merged 9 commits into from Oct 11, 2022

Conversation

seberg
Copy link
Member

@seberg seberg commented Oct 5, 2022

Any conversion from a Python integer (or subclass) that is stored
into a NumPy dtype but does not fit should raise an error in the future.
Note, that casts between NumPy types (or assignments of them) are
explicitly not affected by this.


A few important notes:

I tested SciPy it has a fair bit of failures that seem managable.

Overall, I am not sure about this, but need the decision whether we do this, or stick with the original gh-21875 proposal. The other PR would be a bit simpler, but I am now not quite convinced that coupling the two is worth the simplification there. My worry is that the MA code may be strange, but downstream might also have similar hard to analyze issues with the change.

Any conversion from a Python integer (or subclass) that is stored
into a NumPy dtype but does not fit should raise an error in the future.
Note, that casts between NumPy types (or assignments of them) are
explicitly not affected by this.

There are certain use-cases for allowing such casts, even if
technically undefined behavior in C.  They just work out well
in practice in many cases since e.g. -1 is all 1's in binary
represenation (twos complement repr).
This is a temporary solution until such a time where we can have
loop specific identities which can handle this more gracefully.
This wraps the fill value into an array, the default fill value for all
ointegers is 99999 which doesn't work for many integer dtypes.

Note that this might still subtle change the behavior in other
code paths where we cannot avoid this.  Plus, the deprecationwarning
may show up (and in fact be a "in the future will use the default
fill value" warning).
@seberg seberg force-pushed the deprecate-out-of-bound-pyint-conversion branch from df872b4 to 7a951f9 Compare October 5, 2022 20:33
# TODO: This is probably a mess, but should best preserve behavior?
vals = tuple(
np.array(_recursive_fill_value(dtype[name], f))
for name in dtype.names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you reach the conclusion that this is correct based on a test failure? I don't see an added test, but the lines are not marked by coverage so I assume they are tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there were a few test failures (mainly in the recarray code that handles a lot of masked arrays with structs).
They go away with this change, which mirrors the non-struct path (that always wraps into array()...

It took me a while to figure that this seems like the easy way out, I don't particularly love it, but it seems to work. It is a bit unclear if downstream might notice these changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

if (ufuncimpl == NULL) {
return NULL;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a request from Numba to expose the ufunc loop used for a dtype. They did not explicitly ask for reductions, but I imagine that will be the next request. Will it be sufficient to expose reducelike_promote_and_resolve or will we need more logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I have to think abou the API. Could would probably consider a single function/entrypoint. But with reduction=True as kwarg?
There are some weirder things in how we do reduction promotion and type resolution right now, but I imagine that is fine.

Such new API might also just always live in a NEP 50 future.

I don't think it matters much for this PR. To me the decision here seems mainly whether we prefer this churn, over the churn of having a special (internal for now) API for Python integers. (And maybe the timing of both)

@mattip
Copy link
Member

mattip commented Oct 6, 2022

To me the decision here seems mainly whether we prefer this churn, over the churn of having a special (internal for now) API for Python integers

I would prefer this churn, and not have a special API for Python integers.

I tested SciPy it has a fair bit of failures that seem managable.

Hmm, that is kind of worrisome. But doesn't #21875 also have a number of SciPy failures?

@seberg
Copy link
Member Author

seberg commented Oct 6, 2022

But doesn't #21875 also have a number of SciPy failures?

Yes, but they are a fully distinct set of failures. Merging this has no influence on those failures (besides maybe printing slightly different or giving two warnings for the same thing).

The failures here are things like np.array([-1], dtype=np.uint8) or np.uint8(-1) mainly, which NEP 50 has no opinion about.
The failures in the other PR are things like np.uint8(3) + (-1) which is similar but distinct (because in a pre NEP 50 world we convert to np.int16(-1) and not np.uint8(-1).

"NumPy will stop allowing conversion of out-of-bound "
"Python integers to integer arrays. The conversion "
"of %.100R to %S will fail in the future.",
obj, descr) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the astype() workaround, i.e. np.array(-1, np.uint8) will raise here, but np.array(-1).astype(np.uint8) will do the right thing? I am not sure how to express that this deep in the code path...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, also added a release note. If this becomes too painful for downstream, we could consider allowing np.uint8(-1) making it explicitly work for negative values. (I would still restrict this to the range of the corresponding signed integer, so -128 for uint8)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need that as a fallback

@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Oct 6, 2022
@seberg seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Oct 6, 2022
Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only a small nit in the release note. Anyone else want to take a look?

doc/release/upcoming_changes/22393.deprecation.rst Outdated Show resolved Hide resolved
Co-authored-by: Matti Picus <matti.picus@gmail.com>
@mattip mattip merged commit 226f9c5 into numpy:main Oct 11, 2022
@seberg seberg deleted the deprecate-out-of-bound-pyint-conversion branch October 11, 2022 16:35
@mattip
Copy link
Member

mattip commented Oct 11, 2022

Thanks @seberg. What happens now to #21875?

@seberg
Copy link
Member Author

seberg commented Oct 11, 2022

I will update it to remove the special parsing path. They will use the normal parsing paths instead always (might need some test adjustments). It won't be much simpler, but we won't have the "special" path at least.

WarrenWeckesser added a commit to WarrenWeckesser/scipy that referenced this pull request Oct 12, 2022
After the change in numpy/numpy#22385, numpy
raises a deprecation warning with calls such as np.int8(5000) and
np.uint32(-1).  This change avoids such calls in the tests.
@WarrenWeckesser
Copy link
Member

Fixing the SciPy tests wasn't too bad: scipy/scipy#17209

ev-br pushed a commit to ev-br/scipy that referenced this pull request Oct 14, 2022
After the change in numpy/numpy#22385, numpy
raises a deprecation warning with calls such as np.int8(5000) and
np.uint32(-1).  This change avoids such calls in the tests.
tylerjereddy pushed a commit to tylerjereddy/scipy that referenced this pull request Oct 17, 2022
After the change in numpy/numpy#22385, numpy
raises a deprecation warning with calls such as np.int8(5000) and
np.uint32(-1).  This change avoids such calls in the tests.
iakoster added a commit to iakoster/pyiak_instr that referenced this pull request Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants