Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Conversion of numpy.nan to int gives inconsistent results #21166

Closed
Slagt opened this issue Mar 7, 2022 · 8 comments
Closed

BUG: Conversion of numpy.nan to int gives inconsistent results #21166

Slagt opened this issue Mar 7, 2022 · 8 comments
Labels

Comments

@Slagt
Copy link

Slagt commented Mar 7, 2022

Describe the issue:

When converting a numpy array containing numpy.nan to type int, the numpy.nan are replaced by either -9223372036854775808 or 0 depending on the computer.

numpy.nan is replaced by -9223372036854775808 on a Mac Pro (2019).
numpy.nan is replaced by 0 on a MacBook Air (M1, 2020).
Computers have the same version of macOS (12.2.1), Python (3.9.10) and numpy (1.22.2).

Expected output

Both computers should behave the same. Either throw an error, or return the same value.

Reproduce the code example:

import numpy
print(numpy.array(numpy.nan).astype(int))

Error message:

No response

NumPy/Python version information:

1.22.2 3.9.10 (main, Jan 15 2022, 11:40:53)
[Clang 13.0.0 (clang-1300.0.29.3)]

@Slagt Slagt added the 00 - Bug label Mar 7, 2022
@zephyr111
Copy link
Contributor

Hello,

Thank you for reporting this issue.

I have the same problem (-9223372036854775808 without error) on Linux using Numpy 1.20.3 and CPython 3.9.9 (GCC 11.2.0). I can also reproduce the same behavior with Numpy at the commit 08248aa (25 february) compiled with GCC 11.2.0-13.

The the Numpy main casting function defined here does not check special floating point numbers like NaN, -Inf, +Inf (nor out-of-bounds values). This is due to the basic double-to-int cast performed in this same function as is an undefined behavior in C. For more information about this behavior in the C standard, please read this.

This problem is closely related to the undefined behavior found out in #21123 .

While fixing this is relatively straightforward, the actual question is: what is the expected behavior in this case in Numpy?

@Slagt
Copy link
Author

Slagt commented Mar 7, 2022

Personally, I would expect an error to be thrown. It does not really make sens to convert an entity to an integer if the entity is "not a number".

@seberg
Copy link
Member

seberg commented Mar 7, 2022

Indeed, NumPy is bad about producing floating point warnings for this kind of casts – it basically never even tried. Note that CPUs/compilers should not be so lazy: This should be a NumPy issue – although I would not be surprised if some compilers misbehave.

I.e. it should be fairly straight forward to ensure that a warning is reliably given here.

About the general undefined behaviour: It is observed fairly regularly. It would seem great to define a "correct" result, like 0 or the minimum value. But probably only if it has no major impact on speed. Unfortunately, that seems unlikely.

I am optimistically marking this as a "project", but only for the part about checking floating point warnings for casts. (There should be duplicate issues open, so we may end up consolidating and close this one though.)

EDIT: If anyone wants to dive into this. We need to copy some of the logic that ufuncs use (in umath/ufunc_object.c) for floating point handling to the cast functions.

@seberg seberg added the Project Possible project, may require specific skills and long commitment label Mar 7, 2022
@Prakhar-mehta20
Copy link

Prakhar-mehta20 commented Mar 8, 2022

I am getting the same error.
Running it on Kali Linux . Numpy version 1.21.5.

@landonrodgers
Copy link

I'm interested in working on this issue, I'll start looking into it right now

@seberg seberg removed the Project Possible project, may require specific skills and long commitment label Apr 1, 2022
@seberg
Copy link
Member

seberg commented Apr 1, 2022

Removing the "Project" label. I started working on this (I really need at least the part that np.array([1., 2.], dtype=np.float32) + 1e300 should give an OverfloWarning in the future, where the result would be float32 and not the current float64).
And I had to realize that this is threaded to quite a bit more helpers and places (e.g. indexing), so it would have been a very hard project probably. But mainly, I need to make progress on this soon.

@adeak
Copy link
Contributor

adeak commented Apr 19, 2022

The results also seem to depend on the target type, see this question on Stack Overflow, which is probably also related:

npnan = np.float64(np.nan)
print(npnan.astype('int8'))   # Output: 0 
print(npnan.astype('int16'))  # Output: 0
print(npnan.astype('int32'))  # Output:          -2147483648 (i.e. -2 ** 31)
print(npnan.astype('int64'))  # Output: -9223372036854775808 (i.e. -2 ** 63)

(These are from numpy 1.21.6 or 1.22.3 on debian.)

Edit: see #21364

@seberg
Copy link
Member

seberg commented Jun 14, 2022

Going to close this. This now will give:

RuntimeWarning: invalid value encountered in cast

Which will honor the np.errstate(invalid=mode) setting. One could argue for a more extreme measure of always raising an error. But I think this is a pretty big step, so I would prefer opening a new issue on it (which is very welcome!).

xref gh-21437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants