Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: data type conversion of numpy array containing NaN #21364

Closed
pietrodantuono opened this issue Apr 19, 2022 · 2 comments
Closed

BUG: data type conversion of numpy array containing NaN #21364

pietrodantuono opened this issue Apr 19, 2022 · 2 comments
Labels

Comments

@pietrodantuono
Copy link

pietrodantuono commented Apr 19, 2022

I have already posted this as a stackoverflow question, but this seems the best place to post it.

I am incurring in a behaviour that I cannot comprehend, and it appears to me like a bug.

Problem description

I was hoping that trying to convert the data type of an array to integer would raise the "classic" ValueError: cannot convert float NaN to integer when the array contains NaN.
This unfortunately does not happen.

At a first glance this seems not happening because the NaN contained in a numpy array are converted to numpy.float64 instead of "remaining" float as per numpy.ndarray documentation.

In fact, consider this example:

import numpy as np
arr = np.array([np.nan, 1])

print(type(arr[0]))      # Output: <class 'numpy.float64'>


# Converting numpy.float64 NaN to other data types

## numpy integers
print( np.int8(arr[0]))  # Output: 0 
print(np.int32(arr[0]))  # Output: 0
print(np.int32(arr[0]))  # Output:          -2147483648 (i.e. -2 ** 31)
print(np.int64(arr[0]))  # Output: -9223372036854775808 (i.e. -2 ** 63)

## numpy unsigned integers
print( np.uint8(arr[0]))  # Output: 0
print(np.uint16(arr[0]))  # Output: 0
print(np.uint32(arr[0]))  # Output: 0
print(np.uint64(arr[0]))  # Output: 9223372036854775808

## numpy floats
print(np.float16(arr[0]))  # Output: NaN
print(np.float32(arr[0]))  # Output: NaN
print(np.float64(arr[0]))  # Output: NaN


# Converting numpy.float NaN to other data types

## np.float and np.int are aliases to float and int respectively (both np.float and np.int are deprecated)
type(np.nan) == type(np.float(np.nan))  # Output: True
print(np.int(np.nan))                   # Output: ValueError: cannot convert float NaN to integer

## numpy integers
print( np.int8(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.int16(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.int32(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.int64(np.nan))  # Output: ValueError: cannot convert float NaN to integer

## numpy unsigned integers
print( np.uint8(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.uint16(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.uint32(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.uint64(np.nan))  # Output: ValueError: cannot convert float NaN to integer

## numpy floats
print(np.float16(np.nan))  # Output: NaN
print(np.float32(np.nan))  # Output: NaN
print(np.float64(np.nan))  # Output: NaN

Questions

  • Why the elements of arr are converted to numpy.float64 instead of float in the first place?
  • In what exactly do python built-in float and numpy.float64 differ?
  • Why np.int32(arr) and np.int64(arr) contain the "smallest possible int" while np.int8(arr) and np.int16(arr) contain zeros? (Similar question for unsigned integer types)

NumPy/Python version information:

I am experiencing this behaviour on multiple platform/python/numpy versions, see the three examples below.

import platform
import sys
import numpy
print('Platform:', platform.platform())
print('Python version:', sys.version)
print('numpy.__version__:', numpy.__version__)

WSL

Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python version: 3.9.7 (default, Mar  3 2022, 13:49:04) 
[GCC 9.3.0]
numpy.__version__: 1.21.5

Windows

Platform: Windows-10-10.0.22593-SP0
Python version: 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]
numpy.__version__: 1.21.0

Online W3Schools

Platform: Linux-4.19.0-18-amd64-x86_64-with-glibc2.29
Python version: 3.8.2 (default, Mar 13 2020, 10:14:16) 
[GCC 9.3.0]
numpy.__version__: 1.18.2
@seberg
Copy link
Member

seberg commented Apr 20, 2022

You can even get different results based on the length of the vector. Casting floats to integers is undefined in C if the integer value would be out of bound. I am working on floating FPEs given by the CPU/C for these casts, so at least most should give an "invalid value" warning at least. (For the Nan/Inf cases, for large out-of-bound values things get even more tricky it seems.)

I do not think anyone ever tried to actually raise the warning, it could be done, but I would not be surprised if it slows down the casts quite a bit.

@seberg
Copy link
Member

seberg commented Jun 14, 2022

All of these now give a warning on the main branch. Maybe we should escalate that, but that requires some more thought IMO, so I think we should start a new discussion about it.

The behavior can be modified using the np.errstate() mechanism.

Thus I will close the issue for now, but please don't hesitate to open a new one about guaranteeing an error being raised.

xref gh-21437 (which added the warnings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants