Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise errors on float->int conversions with NaN, inf #14412

Open
rth opened this issue Sep 3, 2019 · 7 comments
Open

Raise errors on float->int conversions with NaN, inf #14412

rth opened this issue Sep 3, 2019 · 7 comments

Comments

@rth
Copy link
Contributor

rth commented Sep 3, 2019

With ndarray.astype(..., casting="unsafe") float arrays with NaN/inf will be converted to np.iinfo(dtype).min/max,

>>> np.array([1, np.nan], dtype=np.float32).astype(np.int32)
array([          1, -2147483648], dtype=int32)
>>> np.array([1, np.inf], dtype=np.float32).astype(np.int32)
array([          1, -2147483648], dtype=int32)

(there are also some inconsistencies there cf #6109). This is very bad in practical applications as output data will be wrong by orders of magnitude. Other casting options simply disallow casting float to int.

At the same time, casting from dtype=np.object works as expected,

>>> np.array([1, np.inf], dtype=np.object).astype(np.int32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: cannot convert float infinity to integer

It could be useful to have some additional casting option allowing to convert int to float, but would error on NaN or inf. This could be done manually,

if array.dtype.kind == 'f' and np.dtype(dtype).kind == 'i':
   # check for overflows and NaN
   _assert_all_finite(array)
result = array.astype(dtype)

but having it in numpy would be useful.

I also wonder if there is really a case when not raising on such conversions is meaningful (even with casting='unsafe'). For instance pandas does this conversions as expected (using numpy dtypes),

>>> pd.Series([1, np.nan], dtype=np.float64).astype(np.int)
[...]
ValueError: Cannot convert non-finite values (NA or inf) to integer

Numpy/Python version information:

>>> import sys, numpy; print(numpy.__version__, sys.version)
1.16.4 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]
@rth rth changed the title astype casting option to error on float->dtype conversions with NaN, inf Raise errors on float->int conversions with NaN, inf Sep 3, 2019
@seberg
Copy link
Member

seberg commented Sep 3, 2019

Yeah, I have been wondering about that myself. But on the other hand, blowing up casting options is a bit annoying. For integers specifically, it could be part of a error flags or so (i.e. if we had integer overflow errors, that could clearly handle this part).

@aiqc
Copy link

aiqc commented Jun 16, 2021

I agree about raising an error. Here is a simpler example to follow:

import numpy as np

arr2D = [
    [9,9,9],
    [9,np.nan,9],
    [9,9,9]
]

arr2D = np.array(arr2D)
arr2D = arr2D.astype('int32')

>>> arr2D
array([[          9,           9,           9],
       [          9, -2147483648,           9],
       [          9,           9,           9]], dtype=int32)

⚠️⚠️⚠️ -2147483648 ⚠️⚠️⚠️

I think this is C's int minimum?

@seberg
Copy link
Member

seberg commented Jun 16, 2021

@aiqc that returns undefined behaviour (as defind by the C standard). It is a bit scary/annoying, but right now you must not rely on the minimum value to be returned. Typical values are either min-int or 0. Most importantly, IIRC, tthis can fluctuate on the same computer (because the CPU has different ways to do the same thing except for this type of undefined behaviour, and we switch depending on whats faster – that is, we may or may not use SIMD instructions).

xref gh-19248. If we would not silence floating point warnings would at least warn you about this happening:

In [1]: import numpy as np
   ...: 
   ...: arr2D = [
   ...:     [9,9,9],
   ...:     [9,9e100,9],
   ...:     [9,9,9]
   ...: ]

In [2]: np.positive(arr2D, casting="unsafe", signature="l")
<ipython-input-2-7fc1fa2347c4>:1: RuntimeWarning: invalid value encountered in positive
  np.positive(arr2D, casting="unsafe", signature="l")
Out[2]: 
array([[                   9,                    9,                    9],
       [                   9, -9223372036854775808,                    9],
       [                   9,                    9,                    9]])

@layne-sadler
Copy link

@seberg if it can’t be reliably parsed then that’s even more reason to raise error on dtype conversion?

@seberg
Copy link
Member

seberg commented Jun 16, 2021

Yes, I don't mind raising an error, but I think a warning would be a good improvement. There are two things we may need to consider:

  • There may be quite a bit of code assuming it gets the smallest possible int value (and gets away with that, even if it isn't always true)
  • It would be good to know if there are any speed implications to sanitizing the values. Even if we just ensure the result is indeed the smallest possible int.

@seberg
Copy link
Member

seberg commented Jun 14, 2022

We are now giving a warning in this cases after gh-21437. Lets keep this one issue open about possibly raising an error. Or even sanitizing the output.
(Although, I currently doubt it will top the priority list in the forseeable future.)

@aiqc
Copy link

aiqc commented Jun 14, 2022

>>> np.nan == -2147483648
False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants