DEP: Next step in scalar type alias deprecations/futurewarnings #22607

seberg · 2022-11-17T13:47:14Z

Finalizes the scalar type alias deprecations making them an error.
However, at the same time adds a FutureWarning that new aliases
will be introduced in the future.
(They would eventually be preferred over the str_, etc. version.)

It may make sense, that this FutureWarning is already propelled soon
since it interacts with things such as changing the representation of
strings to np.str_("") if the preferred alias becomes np.str.

It also introduces a new deprecation to remove the 0 sized bit-aliases
and the bitsize bool8 alias. (Unfortunately, these are here still allowed
as part of the np.sctypeDict).

xref gh-22021

Finalizes the scalar type alias deprecations making them an error. However, at the same time adds a `FutureWarning` that new aliases will be introduced in the future. (They would eventually be preferred over the `str_`, etc. version.) It may make sense, that this FutureWarning is already propelled soon since it interacts with things such as changing the representation of strings to `np.str_("")` if the preferred alias becomes `np.str`. It also introduces a new deprecation to remove the 0 sized bit-aliases and the bitsize `bool8` alias. (Unfortunately, these are here still allowed as part of the `np.sctypeDict`).

seberg · 2022-11-17T13:57:50Z

@rgommers in the spirit of having a chance to get this into the next release (since you marked the issue for that), maybe you can have a look/review?

I did feel like adding that object0 deprecation, but could pull it out, since it is somewhat different. Also, I added long to the list of new aliases. This has a bit of a point, because I feel it may help navigating the transitioning the default integer eventually. (int is a terrible alias for long if we want the default integer to not be long)

seberg · 2022-11-17T14:43:10Z

@rossbar not sure it matters, but circleCI is showing a lot of these warnings:

reading sources... [ 82%] reference/generated/numpy.show_config .. reference/generated/numpy.ufunc.identity
/home/circleci/repo/venv/lib/python3.8/site-packages/sphinx/ext/autosummary/__init__.py:678: FutureWarning: In the future `np.bytes` will be defined as the corresponding NumPy scalar.  (This may have returned Python scalars in past versions.
  return getattr(mod, name_parts[-1]), mod, modname
/home/circleci/repo/venv/lib/python3.8/site-packages/sphinx/ext/autosummary/__init__.py:701: FutureWarning: In the future `np.bytes` will be defined as the corresponding NumPy scalar.  (This may have returned Python scalars in past versions.
  obj = getattr(obj, obj_name)
/home/circleci/repo/venv/lib/python3.8/site-packages/sphinx/ext/autosummary/__init__.py:678: FutureWarning: In the future `np.str` will be defined as the corresponding NumPy scalar.  (This may have returned Python scalars in past versions.
  return getattr(mod, name_parts[-1]), mod, modname
/home/circleci/repo/venv/lib/python3.8/site-packages/sphinx/ext/autosummary/__init__.py:701: FutureWarning: In the future `np.str` will be defined as the corresponding NumPy scalar.  (This may have returned Python scalars in past versions.
  obj = getattr(obj, obj_name)
reading sources... [ 85%] reference/generated/numpy.ufunc.nargs

I am a bit confused by it, I think they may be coming from the Parameter types, like:

Parameters
----------
    arg : str or None

which seems strange? CI doesn't fail, but should I worry about it?

rossbar · 2022-11-17T18:54:01Z

Yeah that is a lot of warnings. I wouldn't think it's related to the parameter descriptions as that's only parsed by numpydoc and these warnings are originating from autosummary. From the warning it looks like this is coming from however autosummary does object discovery, but I'll have to take a closer look.

rgommers · 2022-11-20T06:24:22Z

Thanks @seberg!

I did feel like adding that object0 deprecation, but could pull it out, since it is somewhat different.

That looks like a good idea to me.

Also, I added long to the list of new aliases. This has a bit of a point, because I feel it may help navigating the transitioning the default integer eventually. (int is a terrible alias for long if we want the default integer to not be long)

This seems less desirable. np.long is already a deprecated alias now, to np.compat.long. And long itself is a terrible thing in C, good way to get portability issues. I'm not sure I understand why you want it - don't we already have intp for this purpose?

seberg · 2022-11-20T08:34:59Z

To be clear, long is part of those that would now fail, but also give a FutureWarning that ~~their meaning changes~~ it will be defined in a future version.

seberg · 2022-11-20T09:03:56Z

Right now, I thought mainly that it may be useful to have the option, but lets see:

np.int should probably not exist for a while, it would have to be long which is bad? np.int_ is currently actually not deprecated and the canonical spelling of "long"
If np.long is available, then we could get rid of the confusing np.int_ (i.e. why I thought the warning is good).

The big problem I am aware of is Cython, since it has the following (and equivalent unsigned):

ctypedef npy_long       int_t
ctypedef npy_longlong   long_t
ctypedef npy_longlong   longlong_t

And somehow, such use needs to transition. Maybe long_t could just be removed (it is a weird name for something else and np.long is gone).
If we repurpose is quickly np.int_t could be removed? But even then, the numpy "default integer" will have to be spelled as cnp.intp_t potentially in the future, or we change the bit-width of Cython code using np.int_t. OTOH, only on windows, maybe we can get away with it.

rgommers · 2022-11-20T20:40:09Z

Hmm, such integer puzzles always hurt my head. Removing long_t in Cython sounds nice.

If np.long is available, then we could get rid of the confusing np.int_ (i.e. why I thought the warning is good).

long is bad too though. I'm still confused by this. We have np.long, we deprecated it, that seems like a good idea to me. We want to get rid of it, and not bring it back.

Some relevant content from the 1.20.0 release notes:

Deprecated name    Identical to  NumPy scalar type names
=================  ============  ==================================================================
``numpy.long``     ``int``       `numpy.int_` (C ``long``), `numpy.longlong` (largest integer type)

np.int64 or np.int32 to specify the precision exactly. This ensures that results cannot depend on the computer or operating system.
np.int_ or int (the default), but be aware that it depends on the computer and operating system.
The C types: np.cint (int), np.int_ (long), np.longlong.
np.intp which is 32bit on 32bit machines 64bit on 64bit machines. This can be the best type to use for indexing.

We are still missing a good guide to this I think, I usually have to dig up https://github.com/scikit-learn/scikit-learn/wiki/C-integer-types%3A-the-missing-manual. My understanding of our current situation is:

We want to guide users to int32, int64 etc. for almost everything, and intp for indexing.
We don't want any aliases for Python types
Everything else is secondary, and should be de-emphasized. We have exact mappings to C data types which we must keep, and tons of cruft that we need to deprecate.

So after writing this: your point is that np.long is a better name for the C long type than np.int_? That's probably true. So yes, I agree that ideally that'd be the name.

seberg · 2022-11-20T21:11:57Z

Yes, that is my point. The old long had to go, because it was the Python int (inheritance from Python 2 long, which must be why Cython is still on long long 😱).

In any case, I think np.int_ sounds like "default integer" (if it isn't C int), which is terrible if we change the default integer to be intp (to get 64bit ints on windows 64bit).

We don't have to follow through, but adding a future-warning opens us up a tiny bit more to that, I think.

rgommers · 2022-11-20T21:13:28Z

Okay, makes sense then, +1 from me. Let me review the rest of this PR then.

rgommers

Overall LGTM, just some minor comments. Removing finfo.machar is the main one; in case of time pressure before release that can also be done in a follow-up.

rgommers · 2022-11-20T21:26:55Z

numpy/__init__.py

-    )
-
-    # NumPy 1.22, 2021-10-20
-    __deprecated_attrs__["MachAr"] = (


np.finfo().machar was deprecated at the same time, can you remove that as well?

Of course, was thinking to do a followup.

numpy/__init__.py

rgommers · 2022-11-20T21:34:50Z

numpy/core/_type_aliases.py

-
-        # add mapping for both the bit name and the numarray name
-        sctypeDict[myname] = info.type
+        # Add to the main namespace if desired:


Not sure what this comment means. allTypes is not in the main namespace, and I think we'd like to keep it that way. Remove?

allTypes is what ends up in the main namespace:

# Now add the types we've determined to this module for key in allTypes: globals()[key] = allTypes[key] __all__.append(key)

Ah okay, missed that bit.

rgommers · 2022-11-21T09:05:45Z

LGTM now, in it goes. Thanks Sebastian

This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str). Those types were deprecated in 1.20 and are removed in 1.24, cf numpy/numpy#22607.

seberg · 2023-02-07T11:15:25Z

I am not sure I am convinced we shouldn't eventually get rid of the bool8 name. But I am happy with saying that we should delay the deprecation until np.bool itself established (we could maybe hide it from __dir__ for now though).

IMO int8, is very different: just like in C/C++ the bits denote the type (what you can store/behavior) and not just how it is stored.

rgommers · 2023-02-07T11:23:57Z

I am not sure I am convinced we shouldn't eventually get rid of the bool8 name. But I am happy with saying that we should delay the deprecation until np.bool itself established (we could maybe hide it from __dir__ for now though).

That seems fine to me too.

IMO int8, is very different: just like in C/C++ the bits denote the type (what you can store/behavior) and not just how it is stored.

Agreed. The 8 in bool8 is just noise and doesn't have semantic meaning. Plus, if we'd ever want to change the builtin boolean dtype to a 1-bit dtype, we should be free to do so (no plans, and various possible hiccups in practice, but in theory it makes sense). And I've personally never seen bool8 used anywhere. We should definitely hide it, and get rid of it later on.

…ypes ### Problem description Numpy has started changing the alias to some of its data-types. This means that users with the latest version of numpy they will face either warnings or errors according to the type that they are using. This affects all the users using numoy > 1.20.0 One of the types was fixed back in September with this [pull](#37817) request [numpy 1.24.0](numpy/numpy#22607): The scalar type aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually be removed. [numpy 1.20.0](numpy/numpy#14882): Using the aliases of builtin types like np.int is deprecated ### What changes were proposed in this pull request? From numpy 1.20.0 we receive a deprecattion warning on np.object(https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations) and from numpy 1.24.0 we received an attribute error: ``` attr = 'object' def __getattr__(attr): # Warn for expired attributes, and return a dummy function # that always raises an exception. import warnings try: msg = __expired_functions__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) def _expired(*args, **kwds): raise RuntimeError(msg) return _expired # Emit warnings for deprecated attributes try: val, msg = __deprecated_attrs__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) return val if attr in __future_scalars__: # And future warnings for those that will change, but also give # the AttributeError warnings.warn( f"In the future `np.{attr}` will be defined as the " "corresponding NumPy scalar.", FutureWarning, stacklevel=2) if attr in __former_attrs__: > raise AttributeError(__former_attrs__[attr]) E AttributeError: module 'numpy' has no attribute 'object'. E `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. E The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: E https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` From numpy version 1.24.0 we receive a deprecation warning on np.object0 and every np.datatype0 and np.bool8 >>> np.object0(123) <stdin>:1: DeprecationWarning: `np.object0` is a deprecated alias for ``np.object0` is a deprecated alias for `np.object_`. `object` can be used instead. (Deprecated NumPy 1.24)`. (Deprecated NumPy 1.24) ### Why are the changes needed? The changes are needed so pyspark can be compatible with the latest numpy and avoid - attribute errors on data types being deprecated from version 1.20.0: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations - warnings on deprecated data types from version 1.24.0: https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations ### Does this PR introduce _any_ user-facing change? The change will suppress the warning coming from numpy 1.24.0 and the error coming from numpy 1.22.0 ### How was this patch tested? I assume that the existing tests should catch this. (see all section Extra questions) I found this to be a problem in my work's project where we use for our unit tests the toPandas() function to convert to np.object. Attaching the run result of our test: ``` _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/dist-packages/<my-pkg>/unit/spark_test.py:64: in run_testcase self.handler.compare_df(result, expected, config=self.compare_config) /usr/local/lib/python3.9/dist-packages/<my-pkg>/spark_test_handler.py:38: in compare_df actual_pd = actual.toPandas().sort_values(by=sort_columns, ignore_index=True) /usr/local/lib/python3.9/dist-packages/pyspark/sql/pandas/conversion.py:232: in toPandas corrected_dtypes[index] = np.object # type: ignore[attr-defined] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ attr = 'object' def __getattr__(attr): # Warn for expired attributes, and return a dummy function # that always raises an exception. import warnings try: msg = __expired_functions__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) def _expired(*args, **kwds): raise RuntimeError(msg) return _expired # Emit warnings for deprecated attributes try: val, msg = __deprecated_attrs__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) return val if attr in __future_scalars__: # And future warnings for those that will change, but also give # the AttributeError warnings.warn( f"In the future `np.{attr}` will be defined as the " "corresponding NumPy scalar.", FutureWarning, stacklevel=2) if attr in __former_attrs__: > raise AttributeError(__former_attrs__[attr]) E AttributeError: module 'numpy' has no attribute 'object'. E `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. E The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: E https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations /usr/local/lib/python3.9/dist-packages/numpy/__init__.py:305: AttributeError ``` Although i cannot provide the code doing in python the following should show the problem: ``` >>> import numpy as np >>> np.object0(123) <stdin>:1: DeprecationWarning: `np.object0` is a deprecated alias for ``np.object0` is a deprecated alias for `np.object_`. `object` can be used instead. (Deprecated NumPy 1.24)`. (Deprecated NumPy 1.24) 123 >>> np.object(123) <stdin>:1: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.9/dist-packages/numpy/__init__.py", line 305, in __getattr__ raise AttributeError(__former_attrs__[attr]) AttributeError: module 'numpy' has no attribute 'object'. `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` I do not have a use-case in my tests for np.object0 but I fixed like the suggestion from numpy ### Supported Versions: I propose this fix to be included in all pyspark 3.3 and onwards ### JIRA I know a JIRA ticket should be created I sent an email and I am waiting for the answer to document the case also there. ### Extra questions: By grepping for np.bool and np.object I see that the tests include them. Shall we change them also? Data types with _ I think they are not affected. ``` git grep np.object python/pyspark/ml/functions.py: return data.dtype == np.object_ and isinstance(data.iloc[0], (np.ndarray, list)) python/pyspark/ml/functions.py: return any(data.dtypes == np.object_) and any( python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[1], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[4], np.object) # datetime.date python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[1], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[6], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[7], np.object) git grep np.bool python/docs/source/user_guide/pandas_on_spark/types.rst:np.bool BooleanType python/pyspark/pandas/indexing.py: isinstance(key, np.bool_) for key in cols_sel python/pyspark/pandas/tests/test_typedef.py: np.bool: (np.bool, BooleanType()), python/pyspark/pandas/tests/test_typedef.py: bool: (np.bool, BooleanType()), python/pyspark/pandas/typedef/typehints.py: elif tpe in (bool, np.bool_, "bool", "?"): python/pyspark/sql/connect/expressions.py: assert isinstance(value, (bool, np.bool_)) python/pyspark/sql/connect/expressions.py: elif isinstance(value, np.bool_): python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[2], np.bool) python/pyspark/sql/tests/test_functions.py: (np.bool_, [("true", "boolean")]), ``` If yes concerning bool was merged already should we fix it too? Closes #40220 from aimtsou/numpy-patch. Authored-by: Aimilios Tsouvelekakis <aimtsou@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com>

…ypes ### Problem description Numpy has started changing the alias to some of its data-types. This means that users with the latest version of numpy they will face either warnings or errors according to the type that they are using. This affects all the users using numoy > 1.20.0 One of the types was fixed back in September with this [pull](#37817) request [numpy 1.24.0](numpy/numpy#22607): The scalar type aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually be removed. [numpy 1.20.0](numpy/numpy#14882): Using the aliases of builtin types like np.int is deprecated ### What changes were proposed in this pull request? From numpy 1.20.0 we receive a deprecattion warning on np.object(https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations) and from numpy 1.24.0 we received an attribute error: ``` attr = 'object' def __getattr__(attr): # Warn for expired attributes, and return a dummy function # that always raises an exception. import warnings try: msg = __expired_functions__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) def _expired(*args, **kwds): raise RuntimeError(msg) return _expired # Emit warnings for deprecated attributes try: val, msg = __deprecated_attrs__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) return val if attr in __future_scalars__: # And future warnings for those that will change, but also give # the AttributeError warnings.warn( f"In the future `np.{attr}` will be defined as the " "corresponding NumPy scalar.", FutureWarning, stacklevel=2) if attr in __former_attrs__: > raise AttributeError(__former_attrs__[attr]) E AttributeError: module 'numpy' has no attribute 'object'. E `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. E The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: E https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` From numpy version 1.24.0 we receive a deprecation warning on np.object0 and every np.datatype0 and np.bool8 >>> np.object0(123) <stdin>:1: DeprecationWarning: `np.object0` is a deprecated alias for ``np.object0` is a deprecated alias for `np.object_`. `object` can be used instead. (Deprecated NumPy 1.24)`. (Deprecated NumPy 1.24) ### Why are the changes needed? The changes are needed so pyspark can be compatible with the latest numpy and avoid - attribute errors on data types being deprecated from version 1.20.0: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations - warnings on deprecated data types from version 1.24.0: https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations ### Does this PR introduce _any_ user-facing change? The change will suppress the warning coming from numpy 1.24.0 and the error coming from numpy 1.22.0 ### How was this patch tested? I assume that the existing tests should catch this. (see all section Extra questions) I found this to be a problem in my work's project where we use for our unit tests the toPandas() function to convert to np.object. Attaching the run result of our test: ``` _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/dist-packages/<my-pkg>/unit/spark_test.py:64: in run_testcase self.handler.compare_df(result, expected, config=self.compare_config) /usr/local/lib/python3.9/dist-packages/<my-pkg>/spark_test_handler.py:38: in compare_df actual_pd = actual.toPandas().sort_values(by=sort_columns, ignore_index=True) /usr/local/lib/python3.9/dist-packages/pyspark/sql/pandas/conversion.py:232: in toPandas corrected_dtypes[index] = np.object # type: ignore[attr-defined] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ attr = 'object' def __getattr__(attr): # Warn for expired attributes, and return a dummy function # that always raises an exception. import warnings try: msg = __expired_functions__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) def _expired(*args, **kwds): raise RuntimeError(msg) return _expired # Emit warnings for deprecated attributes try: val, msg = __deprecated_attrs__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) return val if attr in __future_scalars__: # And future warnings for those that will change, but also give # the AttributeError warnings.warn( f"In the future `np.{attr}` will be defined as the " "corresponding NumPy scalar.", FutureWarning, stacklevel=2) if attr in __former_attrs__: > raise AttributeError(__former_attrs__[attr]) E AttributeError: module 'numpy' has no attribute 'object'. E `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. E The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: E https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations /usr/local/lib/python3.9/dist-packages/numpy/__init__.py:305: AttributeError ``` Although i cannot provide the code doing in python the following should show the problem: ``` >>> import numpy as np >>> np.object0(123) <stdin>:1: DeprecationWarning: `np.object0` is a deprecated alias for ``np.object0` is a deprecated alias for `np.object_`. `object` can be used instead. (Deprecated NumPy 1.24)`. (Deprecated NumPy 1.24) 123 >>> np.object(123) <stdin>:1: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.9/dist-packages/numpy/__init__.py", line 305, in __getattr__ raise AttributeError(__former_attrs__[attr]) AttributeError: module 'numpy' has no attribute 'object'. `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` I do not have a use-case in my tests for np.object0 but I fixed like the suggestion from numpy ### Supported Versions: I propose this fix to be included in all pyspark 3.3 and onwards ### JIRA I know a JIRA ticket should be created I sent an email and I am waiting for the answer to document the case also there. ### Extra questions: By grepping for np.bool and np.object I see that the tests include them. Shall we change them also? Data types with _ I think they are not affected. ``` git grep np.object python/pyspark/ml/functions.py: return data.dtype == np.object_ and isinstance(data.iloc[0], (np.ndarray, list)) python/pyspark/ml/functions.py: return any(data.dtypes == np.object_) and any( python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[1], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[4], np.object) # datetime.date python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[1], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[6], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[7], np.object) git grep np.bool python/docs/source/user_guide/pandas_on_spark/types.rst:np.bool BooleanType python/pyspark/pandas/indexing.py: isinstance(key, np.bool_) for key in cols_sel python/pyspark/pandas/tests/test_typedef.py: np.bool: (np.bool, BooleanType()), python/pyspark/pandas/tests/test_typedef.py: bool: (np.bool, BooleanType()), python/pyspark/pandas/typedef/typehints.py: elif tpe in (bool, np.bool_, "bool", "?"): python/pyspark/sql/connect/expressions.py: assert isinstance(value, (bool, np.bool_)) python/pyspark/sql/connect/expressions.py: elif isinstance(value, np.bool_): python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[2], np.bool) python/pyspark/sql/tests/test_functions.py: (np.bool_, [("true", "boolean")]), ``` If yes concerning bool was merged already should we fix it too? Closes #40220 from aimtsou/numpy-patch. Authored-by: Aimilios Tsouvelekakis <aimtsou@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit b3c26b8) Signed-off-by: Sean Owen <srowen@gmail.com>

This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str). Those types were deprecated in 1.20 and are removed in 1.24, cf numpy/numpy#22607.

…ypes ### Problem description Numpy has started changing the alias to some of its data-types. This means that users with the latest version of numpy they will face either warnings or errors according to the type that they are using. This affects all the users using numoy > 1.20.0 One of the types was fixed back in September with this [pull](apache#37817) request [numpy 1.24.0](numpy/numpy#22607): The scalar type aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually be removed. [numpy 1.20.0](numpy/numpy#14882): Using the aliases of builtin types like np.int is deprecated ### What changes were proposed in this pull request? From numpy 1.20.0 we receive a deprecattion warning on np.object(https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations) and from numpy 1.24.0 we received an attribute error: ``` attr = 'object' def __getattr__(attr): # Warn for expired attributes, and return a dummy function # that always raises an exception. import warnings try: msg = __expired_functions__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) def _expired(*args, **kwds): raise RuntimeError(msg) return _expired # Emit warnings for deprecated attributes try: val, msg = __deprecated_attrs__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) return val if attr in __future_scalars__: # And future warnings for those that will change, but also give # the AttributeError warnings.warn( f"In the future `np.{attr}` will be defined as the " "corresponding NumPy scalar.", FutureWarning, stacklevel=2) if attr in __former_attrs__: > raise AttributeError(__former_attrs__[attr]) E AttributeError: module 'numpy' has no attribute 'object'. E `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. E The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: E https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` From numpy version 1.24.0 we receive a deprecation warning on np.object0 and every np.datatype0 and np.bool8 >>> np.object0(123) <stdin>:1: DeprecationWarning: `np.object0` is a deprecated alias for ``np.object0` is a deprecated alias for `np.object_`. `object` can be used instead. (Deprecated NumPy 1.24)`. (Deprecated NumPy 1.24) ### Why are the changes needed? The changes are needed so pyspark can be compatible with the latest numpy and avoid - attribute errors on data types being deprecated from version 1.20.0: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations - warnings on deprecated data types from version 1.24.0: https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations ### Does this PR introduce _any_ user-facing change? The change will suppress the warning coming from numpy 1.24.0 and the error coming from numpy 1.22.0 ### How was this patch tested? I assume that the existing tests should catch this. (see all section Extra questions) I found this to be a problem in my work's project where we use for our unit tests the toPandas() function to convert to np.object. Attaching the run result of our test: ``` _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/dist-packages/<my-pkg>/unit/spark_test.py:64: in run_testcase self.handler.compare_df(result, expected, config=self.compare_config) /usr/local/lib/python3.9/dist-packages/<my-pkg>/spark_test_handler.py:38: in compare_df actual_pd = actual.toPandas().sort_values(by=sort_columns, ignore_index=True) /usr/local/lib/python3.9/dist-packages/pyspark/sql/pandas/conversion.py:232: in toPandas corrected_dtypes[index] = np.object # type: ignore[attr-defined] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ attr = 'object' def __getattr__(attr): # Warn for expired attributes, and return a dummy function # that always raises an exception. import warnings try: msg = __expired_functions__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) def _expired(*args, **kwds): raise RuntimeError(msg) return _expired # Emit warnings for deprecated attributes try: val, msg = __deprecated_attrs__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) return val if attr in __future_scalars__: # And future warnings for those that will change, but also give # the AttributeError warnings.warn( f"In the future `np.{attr}` will be defined as the " "corresponding NumPy scalar.", FutureWarning, stacklevel=2) if attr in __former_attrs__: > raise AttributeError(__former_attrs__[attr]) E AttributeError: module 'numpy' has no attribute 'object'. E `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. E The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: E https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations /usr/local/lib/python3.9/dist-packages/numpy/__init__.py:305: AttributeError ``` Although i cannot provide the code doing in python the following should show the problem: ``` >>> import numpy as np >>> np.object0(123) <stdin>:1: DeprecationWarning: `np.object0` is a deprecated alias for ``np.object0` is a deprecated alias for `np.object_`. `object` can be used instead. (Deprecated NumPy 1.24)`. (Deprecated NumPy 1.24) 123 >>> np.object(123) <stdin>:1: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.9/dist-packages/numpy/__init__.py", line 305, in __getattr__ raise AttributeError(__former_attrs__[attr]) AttributeError: module 'numpy' has no attribute 'object'. `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` I do not have a use-case in my tests for np.object0 but I fixed like the suggestion from numpy ### Supported Versions: I propose this fix to be included in all pyspark 3.3 and onwards ### JIRA I know a JIRA ticket should be created I sent an email and I am waiting for the answer to document the case also there. ### Extra questions: By grepping for np.bool and np.object I see that the tests include them. Shall we change them also? Data types with _ I think they are not affected. ``` git grep np.object python/pyspark/ml/functions.py: return data.dtype == np.object_ and isinstance(data.iloc[0], (np.ndarray, list)) python/pyspark/ml/functions.py: return any(data.dtypes == np.object_) and any( python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[1], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[4], np.object) # datetime.date python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[1], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[6], np.object) python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[7], np.object) git grep np.bool python/docs/source/user_guide/pandas_on_spark/types.rst:np.bool BooleanType python/pyspark/pandas/indexing.py: isinstance(key, np.bool_) for key in cols_sel python/pyspark/pandas/tests/test_typedef.py: np.bool: (np.bool, BooleanType()), python/pyspark/pandas/tests/test_typedef.py: bool: (np.bool, BooleanType()), python/pyspark/pandas/typedef/typehints.py: elif tpe in (bool, np.bool_, "bool", "?"): python/pyspark/sql/connect/expressions.py: assert isinstance(value, (bool, np.bool_)) python/pyspark/sql/connect/expressions.py: elif isinstance(value, np.bool_): python/pyspark/sql/tests/test_dataframe.py: self.assertEqual(types[2], np.bool) python/pyspark/sql/tests/test_functions.py: (np.bool_, [("true", "boolean")]), ``` If yes concerning bool was merged already should we fix it too? Closes apache#40220 from aimtsou/numpy-patch. Authored-by: Aimilios Tsouvelekakis <aimtsou@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit b3c26b8) Signed-off-by: Sean Owen <srowen@gmail.com>

….19 or >=1.24 Because of the changed behavior of np.bool and similar aliases for builtin data types, we need to restrict the numpy version to the stated range for sonnet. For more information, refer here: numpy/numpy#14882 numpy/numpy#22607

…ed within <=1.19 or >=1.24 Because of the changed behavior of np.bool and similar aliases for builtin data types, we need to restrict the numpy version to the stated range for sonnet. For more information, refer here: numpy/numpy#14882 numpy/numpy#22607 fix: definition of OutputGenerated in RModel_Base

…s) and restricting numpy version avoid trying to load sonnet and graph_nets if not installed Co-Authored-By: moneta <lorenzo.moneta@cern.ch> [tmva][sofie-gnn] numpy version for sofie-gnn test should be restricted within <=1.19 or >=1.24 Because of the changed behavior of np.bool and similar aliases for builtin data types, we need to restrict the numpy version to the stated range for sonnet. For more information, refer here: numpy/numpy#14882 numpy/numpy#22607 fix: definition of OutputGenerated in RModel_Base [tmva][sofie-gnn] Suppress warnings for cases other than .dat file in method WriteInitializedTensorsToFile in RModel

@vepadulano

…s) and restricting numpy version avoid trying to load sonnet and graph_nets if not installed Co-Authored-By: moneta <lorenzo.moneta@cern.ch> [tmva][sofie-gnn] numpy version for sofie-gnn test should be restricted within <=1.19 or >=1.24 Because of the changed behavior of np.bool and similar aliases for builtin data types, we need to restrict the numpy version to the stated range for sonnet. For more information, refer here: numpy/numpy#14882 numpy/numpy#22607 fix: definition of OutputGenerated in RModel_Base [tmva][sofie-gnn] Suppress warnings for cases other than .dat file in method WriteInitializedTensorsToFile in RModel [tmva][sofie-gnn] Fix node update in GNN and size of global features in GraphIndependent [tmva][sofie-gnn] Fix node update in RModel_GNN generated code [tmva][sofie-gnn] Fix for correct size of global features in GraphIndependent fix also the way the computation of output features in RModel_GNN Fix dimension of global feature tensor during node update If the number of nodes is larger than the edges the tensor storing the global feature needs to be resize to the correct number of nodes * number of feature [tmva][sofie-gnn] Fix importing _gnn if python version is less than 3.8 Improve also gnn test and address some of the Vincenzo's comments Changes addressing comments by @vepadulano Co-authored-by: moneta <lorenzo.moneta@cern.ch>

@vepadulano

…s) and restricting numpy version avoid trying to load sonnet and graph_nets if not installed [tmva][sofie-gnn] numpy version for sofie-gnn test should be restricted within <=1.19 or >=1.24 Because of the changed behavior of np.bool and similar aliases for builtin data types, we need to restrict the numpy version to the stated range for sonnet. For more information, refer here: numpy/numpy#14882 numpy/numpy#22607 fix: definition of OutputGenerated in RModel_Base [tmva][sofie-gnn] Suppress warnings for cases other than .dat file in method WriteInitializedTensorsToFile in RModel [tmva][sofie-gnn] Fix node update in GNN and size of global features in GraphIndependent [tmva][sofie-gnn] Fix node update in RModel_GNN generated code [tmva][sofie-gnn] Fix for correct size of global features in GraphIndependent fix also the way the computation of output features in RModel_GNN Fix dimension of global feature tensor during node update If the number of nodes is larger than the edges the tensor storing the global feature needs to be resize to the correct number of nodes * number of feature [tmva][sofie-gnn] Fix importing _gnn if python version is less than 3.8 Improve also gnn test and address some of the Vincenzo's comments Changes addressing comments by @vepadulano Co-authored-by: moneta <lorenzo.moneta@cern.ch>

@vepadulano

…s) and restricting numpy version avoid trying to load sonnet and graph_nets if not installed [tmva][sofie-gnn] numpy version for sofie-gnn test should be restricted within <=1.19 or >=1.24 Because of the changed behavior of np.bool and similar aliases for builtin data types, we need to restrict the numpy version to the stated range for sonnet. For more information, refer here: numpy/numpy#14882 numpy/numpy#22607 fix: definition of OutputGenerated in RModel_Base [tmva][sofie-gnn] Suppress warnings for cases other than .dat file in method WriteInitializedTensorsToFile in RModel [tmva][sofie-gnn] Fix node update in GNN and size of global features in GraphIndependent [tmva][sofie-gnn] Fix node update in RModel_GNN generated code [tmva][sofie-gnn] Fix for correct size of global features in GraphIndependent fix also the way the computation of output features in RModel_GNN Fix dimension of global feature tensor during node update If the number of nodes is larger than the edges the tensor storing the global feature needs to be resize to the correct number of nodes * number of feature [tmva][sofie-gnn] Fix importing _gnn if python version is less than 3.8 Improve also gnn test and address some of the Vincenzo's comments Changes addressing comments by @vepadulano Co-authored-by: moneta <lorenzo.moneta@cern.ch>

This is a temporary workaround and is related to: numpy/numpy#22607

@vepadulano

…s) and restricting numpy version avoid trying to load sonnet and graph_nets if not installed [tmva][sofie-gnn] numpy version for sofie-gnn test should be restricted within <=1.19 or >=1.24 Because of the changed behavior of np.bool and similar aliases for builtin data types, we need to restrict the numpy version to the stated range for sonnet. For more information, refer here: numpy/numpy#14882 numpy/numpy#22607 fix: definition of OutputGenerated in RModel_Base [tmva][sofie-gnn] Suppress warnings for cases other than .dat file in method WriteInitializedTensorsToFile in RModel [tmva][sofie-gnn] Fix node update in GNN and size of global features in GraphIndependent [tmva][sofie-gnn] Fix node update in RModel_GNN generated code [tmva][sofie-gnn] Fix for correct size of global features in GraphIndependent fix also the way the computation of output features in RModel_GNN Fix dimension of global feature tensor during node update If the number of nodes is larger than the edges the tensor storing the global feature needs to be resize to the correct number of nodes * number of feature [tmva][sofie-gnn] Fix importing _gnn if python version is less than 3.8 Improve also gnn test and address some of the Vincenzo's comments Changes addressing comments by @vepadulano Co-authored-by: moneta <lorenzo.moneta@cern.ch>

seberg added 2 commits November 17, 2022 14:21

TYP: Remove newly deprecated scalar type aliases

49bb24e

github-actions bot added the 07 - Deprecation label Nov 17, 2022

DOC: Document scalar type alias deprecation changes and futurewarning

8b13c8c

BUG: Fixup warning giving and remove MachAr from docs

424cb3f

rgommers approved these changes Nov 20, 2022

View reviewed changes

DOC: Adjust comments on deprecated/future scalar alias based on review

d416581

rgommers merged commit 742545f into numpy:main Nov 21, 2022

rgommers added this to the 1.24.0 release milestone Nov 21, 2022

bmorris3 mentioned this pull request Nov 25, 2022

Remove np.bool8 support (deprecated) scikit-image/scikit-image#6633

Closed

dhomeier mentioned this pull request Nov 30, 2022

TST: Fix devdeps job astropy/astropy#14075

Merged

10 tasks

vrabaud mentioned this pull request Dec 15, 2022

Remove references to deprecated NumPy type aliases. opencv/opencv#22965

Merged

4 tasks

This was referenced Dec 19, 2022

AttributeError: module 'numpy' has no attribute 'bool' in scheduled CI tests spinalcordtoolbox/spinalcordtoolbox#3980

Closed

Pystrum is incompatible with numpy==1.24 due to AttributeError from now-removed np.bool adalca/pystrum#9

Closed

This was referenced Dec 19, 2022

Remove depricated np.complex alias in ImageLib jhkennedy/isce2#1

Closed

Remove depricated np.complex alias in ImageLib isce-framework/isce2#639

Merged

toshihikoyanase mentioned this pull request Dec 20, 2022

Pin numpy version to 1.23.x for mxnet examples optuna/optuna-examples#154

Merged

PGijsbers mentioned this pull request Feb 11, 2023

Code base incompatible with numpy>=1.24 openml/openml-python#1198

Closed

aimtsou mentioned this pull request Feb 28, 2023

[SPARK-42647][PYTHON] Change alias for numpy deprecated and removed types apache/spark#40220

Closed

dkloving mentioned this pull request Mar 15, 2023

remove deprecated numpy bool dtype import achael/eht-imaging#163

Merged

Nafeij mentioned this pull request Apr 22, 2023

numpy.int has been removed since 1.24 qTipTip/Pylette#6

Closed

thomasWeise mentioned this pull request May 10, 2023

Exception with Numpy 1.24.0 onwards due to Expired Deprecation of np.int pdfo/pdfo#55

Closed

anosillus mentioned this pull request May 25, 2023

numpy deprecation warnings when importing nptyping ramonhagenaars/nptyping#102

Open

JoranAngevaare mentioned this pull request Jun 12, 2023

Add longitude_bnds and latitude_bnds to cmip_renaming_dict jbusecke/xMIP#300

Open

4 tasks

fr1ll mentioned this pull request Jun 14, 2023

Change np.float to float in rasterfairy.py Quasimondo/RasterFairy#21

Merged

tcompa mentioned this pull request Jul 24, 2023

Pin numpy to V1 fractal-analytics-platform/fractal-tasks-core#475

Closed

lithomas1 mentioned this pull request Jul 25, 2023

BUG: Series.astype("object_") and Series.astype("object0") unsupported. pandas-dev/pandas#54251

Closed

3 tasks

egidioln mentioned this pull request Aug 10, 2023

update to numpy 1.24 faridyagubbayli/binvox#3

Closed

mtsokol mentioned this pull request Sep 26, 2023

API: Remove zero names from dtype aliases #24807

Merged

Simon-Lopez added a commit to BRGM/ComPASS that referenced this pull request Oct 2, 2023

Avoid deprecations introduced in numpy 1.24

fbe123f

This is a temporary workaround and is related to: numpy/numpy#22607

HBioquant mentioned this pull request Mar 22, 2024

problems with numpy HBioquant/DiffBindFR#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEP: Next step in scalar type alias deprecations/futurewarnings #22607

DEP: Next step in scalar type alias deprecations/futurewarnings #22607

seberg commented Nov 17, 2022 •

edited

seberg commented Nov 17, 2022

seberg commented Nov 17, 2022

rossbar commented Nov 17, 2022

rgommers commented Nov 20, 2022

seberg commented Nov 20, 2022 •

edited

seberg commented Nov 20, 2022

rgommers commented Nov 20, 2022

seberg commented Nov 20, 2022

rgommers commented Nov 20, 2022

rgommers left a comment

rgommers Nov 20, 2022

seberg Nov 21, 2022

rgommers Nov 20, 2022

seberg Nov 21, 2022

rgommers Nov 21, 2022

rgommers commented Nov 21, 2022

seberg commented Feb 7, 2023

rgommers commented Feb 7, 2023

DEP: Next step in scalar type alias deprecations/futurewarnings #22607

DEP: Next step in scalar type alias deprecations/futurewarnings #22607

Conversation

seberg commented Nov 17, 2022 • edited

seberg commented Nov 17, 2022

seberg commented Nov 17, 2022

rossbar commented Nov 17, 2022

rgommers commented Nov 20, 2022

seberg commented Nov 20, 2022 • edited

seberg commented Nov 20, 2022

rgommers commented Nov 20, 2022

seberg commented Nov 20, 2022

rgommers commented Nov 20, 2022

rgommers left a comment

Choose a reason for hiding this comment

rgommers Nov 20, 2022

Choose a reason for hiding this comment

seberg Nov 21, 2022

Choose a reason for hiding this comment

rgommers Nov 20, 2022

Choose a reason for hiding this comment

seberg Nov 21, 2022

Choose a reason for hiding this comment

rgommers Nov 21, 2022

Choose a reason for hiding this comment

rgommers commented Nov 21, 2022

seberg commented Feb 7, 2023

rgommers commented Feb 7, 2023

seberg commented Nov 17, 2022 •

edited

seberg commented Nov 20, 2022 •

edited