Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: (Undefined) behavior of float to integer casts only really explained in release notes #22852

Open
FirefoxMetzger opened this issue Dec 21, 2022 · 5 comments

Comments

@FirefoxMetzger
Copy link

Describe the issue:

When running the example test case below the assertion works fine on Windows, but fails on Linux with the message below.

This is a cross-post from: https://stackoverflow.com/questions/74872407/why-does-numpy-handle-overflows-inconsistently

Related PRs: #21437 , #21123

Reproduce the code example:

import numpy as np

print(f"Numpy Version: {np.__version__}")

# version A
versionA = np.eye(2)
versionA[0, 0] = 2**32 + 1
versionA = versionA.astype(np.uint32)

# version B
versionB = np.eye(2, dtype=np.uint32)
versionB[0, 0] = np.asarray(2**32 + 1)

np.testing.assert_array_equal(versionA, versionB)

Error message:

Numpy Version: 1.24.0
foo.py:8: RuntimeWarning: invalid value encountered in cast
  versionA = versionA.astype(np.uint32)
Traceback (most recent call last):
  File "foo.py", line 14, in <module>
    np.testing.assert_array_equal(versionA, versionB)
  File "/home/firefoxmetzger/.local/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 983, in assert_array_equal
    assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/firefoxmetzger/.local/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not equal

Mismatched elements: 1 / 4 (25%)
Max absolute difference: 1
Max relative difference: 1.
 x: array([[0, 0],
       [0, 1]], dtype=uint32)
 y: array([[1, 0],
       [0, 1]], dtype=uint32)

NumPy/Python version information:

1.24.0 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]

Context for the issue:

I've tried to reduce the example further, but any further modifications change the behavior to make it work as expected. For example, changing from np.eye(2) to np.empty((1, 1)) or np.eye(1) changes the behavior on linux and the test passes.

@seberg
Copy link
Member

seberg commented Dec 21, 2022

C defines this as undefined behavior. NumPy has never guaranteed any specific overflow behavior here. As C is undefined, CPUs are pretty random about what they do (plus compilers). For example for uint32 non-vectorized the compiler may spit out instructions casting to int64 (rather than uint32 directly) which then is "well defined".

The linked PRs, one adds the warning which tells you are getting undefined behavior (but in one case it is implicitly well defined after all, so no warning). The other PR would just make the situation even worse probably, by using more vectorization leading to hitting more UB.

I am not really convinced that we should worry about doing better than C compilers here. Is there a particular use-case?

@FirefoxMetzger
Copy link
Author

My scenario is that I am writing a test suite for some linear algebra routines with the signature fn(ArrayLike, *, dtype=None).

This test suite uses hypothesis to generate input arrays (dtype=float) and a dtype for the output. I use this to test the function and then compare the output against a (easy, but slow) reference implementation. Currently this fails because it encounters the inconsistent behavior reported here. I can work around this by only generating arrays with a compatible datatype though (which users likely won't but we can add documentation to mitigate).

A nice addition to numpy regarding this would be a place in the docs where I could discover that this platform-specific behavior exists. The warning on linux in v1.24 already goes a long way, but maybe it could also show up on Windows (the behavior is still undefined, even if the actual behavior turns out to be consistent). Perhaps we could also provide some more detail on this in the docs and add a link to further information into the warning?

@FirefoxMetzger
Copy link
Author

I also just found a smaller example that reproduces this behavior:

>>> import numpy as np
>>> foo = 2 ** 32 + 1
>>> bar = np.asarray(foo, dtype=float)
>>> np.asarray((foo, foo), dtype="i1")
<stdin>:1: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 4294967297 to int8 will fail in the future.
For the old behavior, usually:
    np.array(value).astype(dtype)`
will give the desired result (the cast overflows).
array([1, 1], dtype=int8)
>>> np.asarray((bar, bar), dtype="i1")
<stdin>:1: RuntimeWarning: invalid value encountered in cast
array([0, 0], dtype=int8)

Interestingly, this appears to be sensitive to the value of foo, i.e., a smaller overflow is handled consistently on the machine I am using right now (ChromeOS with an Intel Core m3):

>>> foo = 256 + 1
>>> bar = np.asarray(foo, dtype=float)
>>> np.asarray((foo, foo), dtype="i1")
<stdin>:1: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 257 to int8 will fail in the future.
For the old behavior, usually:
    np.array(value).astype(dtype)`
will give the desired result (the cast overflows).
array([1, 1], dtype=int8)
>>> np.asarray((bar, bar), dtype="i1")
array([1, 1], dtype=int8)

@seberg seberg changed the title BUG: Inconsistend overflow behavior when casting float64 to uint32 on Linux DOC: (Undefined) behavior of float to integer casts only really explained in release notes Dec 22, 2022
@seberg
Copy link
Member

seberg commented Dec 22, 2022

Right, the docs could probably mention this somewhere (not quite sure where). Runtime-wise, I doubt there is an easy win available, the new warning is probably about as good as it gets. (Not that I would mind making it more well defined if it isn't a big performance issue, and that may well be possible.)

@FirefoxMetzger
Copy link
Author

Hm, the first place I thought off was inside the docs for astype. Seems like the main place where you might bump into this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants