BUG: data type conversion of numpy array containing NaN #21364

pietrodantuono · 2022-04-19T18:36:49Z

I have already posted this as a stackoverflow question, but this seems the best place to post it.

I am incurring in a behaviour that I cannot comprehend, and it appears to me like a bug.

Problem description

I was hoping that trying to convert the data type of an array to integer would raise the "classic" ValueError: cannot convert float NaN to integer when the array contains NaN.
This unfortunately does not happen.

At a first glance this seems not happening because the NaN contained in a numpy array are converted to numpy.float64 instead of "remaining" float as per numpy.ndarray documentation.

In fact, consider this example:

import numpy as np
arr = np.array([np.nan, 1])

print(type(arr[0]))      # Output: <class 'numpy.float64'>


# Converting numpy.float64 NaN to other data types

## numpy integers
print( np.int8(arr[0]))  # Output: 0 
print(np.int32(arr[0]))  # Output: 0
print(np.int32(arr[0]))  # Output:          -2147483648 (i.e. -2 ** 31)
print(np.int64(arr[0]))  # Output: -9223372036854775808 (i.e. -2 ** 63)

## numpy unsigned integers
print( np.uint8(arr[0]))  # Output: 0
print(np.uint16(arr[0]))  # Output: 0
print(np.uint32(arr[0]))  # Output: 0
print(np.uint64(arr[0]))  # Output: 9223372036854775808

## numpy floats
print(np.float16(arr[0]))  # Output: NaN
print(np.float32(arr[0]))  # Output: NaN
print(np.float64(arr[0]))  # Output: NaN


# Converting numpy.float NaN to other data types

## np.float and np.int are aliases to float and int respectively (both np.float and np.int are deprecated)
type(np.nan) == type(np.float(np.nan))  # Output: True
print(np.int(np.nan))                   # Output: ValueError: cannot convert float NaN to integer

## numpy integers
print( np.int8(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.int16(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.int32(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.int64(np.nan))  # Output: ValueError: cannot convert float NaN to integer

## numpy unsigned integers
print( np.uint8(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.uint16(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.uint32(np.nan))  # Output: ValueError: cannot convert float NaN to integer
print(np.uint64(np.nan))  # Output: ValueError: cannot convert float NaN to integer

## numpy floats
print(np.float16(np.nan))  # Output: NaN
print(np.float32(np.nan))  # Output: NaN
print(np.float64(np.nan))  # Output: NaN

Questions

Why the elements of arr are converted to numpy.float64 instead of float in the first place?
In what exactly do python built-in float and numpy.float64 differ?
Why np.int32(arr) and np.int64(arr) contain the "smallest possible int" while np.int8(arr) and np.int16(arr) contain zeros? (Similar question for unsigned integer types)

NumPy/Python version information:

I am experiencing this behaviour on multiple platform/python/numpy versions, see the three examples below.

import platform
import sys
import numpy
print('Platform:', platform.platform())
print('Python version:', sys.version)
print('numpy.__version__:', numpy.__version__)

WSL

Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python version: 3.9.7 (default, Mar  3 2022, 13:49:04) 
[GCC 9.3.0]
numpy.__version__: 1.21.5

Windows

Platform: Windows-10-10.0.22593-SP0
Python version: 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]
numpy.__version__: 1.21.0

Online W3Schools

Platform: Linux-4.19.0-18-amd64-x86_64-with-glibc2.29
Python version: 3.8.2 (default, Mar 13 2020, 10:14:16) 
[GCC 9.3.0]
numpy.__version__: 1.18.2

The text was updated successfully, but these errors were encountered:

seberg · 2022-04-20T18:29:49Z

You can even get different results based on the length of the vector. Casting floats to integers is undefined in C if the integer value would be out of bound. I am working on floating FPEs given by the CPU/C for these casts, so at least most should give an "invalid value" warning ~~at least~~. (For the Nan/Inf cases, for large out-of-bound values things get even more tricky it seems.)

I do not think anyone ever tried to actually raise the warning, it could be done, but I would not be surprised if it slows down the casts quite a bit.

seberg · 2022-06-14T17:29:13Z

All of these now give a warning on the main branch. Maybe we should escalate that, but that requires some more thought IMO, so I think we should start a new discussion about it.

The behavior can be modified using the np.errstate() mechanism.

Thus I will close the issue for now, but please don't hesitate to open a new one about guaranteeing an error being raised.

xref gh-21437 (which added the warnings)

pietrodantuono added the 00 - Bug label Apr 19, 2022

adeak mentioned this issue Apr 19, 2022

BUG: Conversion of numpy.nan to int gives inconsistent results #21166

Closed

seberg closed this as completed Jun 14, 2022

soulitzer mentioned this issue Mar 6, 2024

input_data.to produce inconsistent results pytorch/pytorch#121226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: data type conversion of numpy array containing NaN #21364

BUG: data type conversion of numpy array containing NaN #21364

pietrodantuono commented Apr 19, 2022 •

edited

seberg commented Apr 20, 2022 •

edited

seberg commented Jun 14, 2022 •

edited

BUG: data type conversion of numpy array containing NaN #21364

BUG: data type conversion of numpy array containing NaN #21364

Comments

pietrodantuono commented Apr 19, 2022 • edited

Problem description

Questions

NumPy/Python version information:

WSL

Windows

Online W3Schools

seberg commented Apr 20, 2022 • edited

seberg commented Jun 14, 2022 • edited

pietrodantuono commented Apr 19, 2022 •

edited

seberg commented Apr 20, 2022 •

edited

seberg commented Jun 14, 2022 •

edited