Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.dtype('float64').__class__ not picklable in numpy master #16692

Closed
lesteve opened this issue Jun 26, 2020 · 18 comments · Fixed by #18332
Closed

np.dtype('float64').__class__ not picklable in numpy master #16692

lesteve opened this issue Jun 26, 2020 · 18 comments · Fixed by #18332

Comments

@lesteve
Copy link
Contributor

lesteve commented Jun 26, 2020

We are seeing some errors in scikit-learn CI using numpy dev recently see scikit-learn/scikit-learn#17707 (comment) for some context.

After debugging the problem it comes from the fact that np.dtype('float64').__class__ is not picklable in numpy development version. I haven't done a git bisect but I can do one if that would help.

For completeness the reason we care about this lies within joblib:
https://github.com/joblib/joblib/blob/ac0f1528ffec9cd53cabbafaca887ac880697862/joblib/hashing.py#L222-L235

Reproducing code example:

import pickle

import numpy as np


print('numpy version:', np.__version__)
dt = np.dtype('float64')
print('class: ', dt.__class__)
try:
    pickle.dumps(dt.__class__)
    print('Can pickle dt.__class__')
except Exception as exc:
    print('Exception while pickling dt.__class__')
    print(exc)

Output numpy development version:

numpy version: 1.20.0.dev0+be8ab91
class:  <class 'numpy.dtype[float64]'>
Exception while pickling dt.__class__
Can't pickle <class 'numpy.dtype[float64]'>: attribute lookup dtype[float64] on numpy failed

Ouptut numpy 1.19.0:

numpy version: 1.19.0
class:  <class 'numpy.dtype'>
Can pickle dt.__class__
@lesteve
Copy link
Contributor Author

lesteve commented Jun 26, 2020

Not as good as a bisect but that may help a little bit.

I looked at the dev weels available at https://pypi.anaconda.org/scipy-wheels-nightly/simple/numpy/

So the behaviour change seems to appear a little time ago, between commit daffccf (May 24 last wheel without the issue) and cbde478 (May 31 first wheel with the issue).

--------------------
1.20.0.dev0+20200524034824.daffccf
numpy version: 1.20.0.dev0+daffccf
class:  <class 'numpy.dtype'>
Can pickle dt.__class__
--------------------
1.20.0.dev0+20200531034712.cbde478
numpy version: 1.20.0.dev0+cbde478
class:  <class 'numpy.dtype[float64]'>
Exception while pickling dt.__class__
Can't pickle <class 'numpy.dtype[float64]'>: attribute lookup dtype[float64] on numpy failed

@seberg
Copy link
Member

seberg commented Jun 26, 2020

Oh, that makes sense, hmmm, I did not think about pickles, have to figure out how to fix that... I guess I may have to implement actual pickling manually. Basically the pickling for this this has to use type(np.dtype(np.float64)) probably for legacy DTypes...

@seberg
Copy link
Member

seberg commented Jun 26, 2020

@lesteve curious, can you explain what the quirks you are working around are and how they interact. On master dt.__class__ is specific to the dtype, previously it was just np.dtype which is a singleton instance and pickles trivially?

The quickest fix would be to change the name of the class to some underscored version for now I guess. Although I like the printing of numpy.dtype[float64]...

@seberg
Copy link
Member

seberg commented Jun 26, 2020

Hmmm, there is one other problem (or semi-problem), in that whatever I add now, has to remain supported.

Plus, it is impossible to make this work in a forward compatible way. If you pickle on master, and send it to an older NumPy, that cannot work, since the old NumPy doesn't have a notion for this. And I do not think there is any way to achieve even a nice error message.

But, I guess pickling type(dtype) should be a super niche thing to do... Correct unpickling seems a bit tricky, I suppose I could add a special method for it, in the long run, there might be some hooks to make it work for user dtypes.

@seberg
Copy link
Member

seberg commented Jun 26, 2020

@lesteve how pressing is this for you guys? The only easy solution to this right now would be to give all of these a valid fully qualified name (which technically has to remain valid forever).

I want to add __reduce__, but python refuses me to do that. That would at least require me to only add DTypeMeta (or some function) with a fully qualified name.
It is not that it cannot be done without the a fully qualified name, but it appears to complicate some other things in ways I wanted to avoid for a while (it seems there is an assumption in pickle that a static types always have a fully qualified name). Which to be fair, is not unreasonable, but currently broken, because I did not want to make a final decision on the namespace (while also avoiding making everything HeapTypes right now, which would just be a lot of hassle due to the C-API not making it easy).

@anirudh2290
Copy link
Member

@seberg can we switch to the underscores now and switch to heaptypes before the next release or is that too aggressive/disruptive ?

@seberg
Copy link
Member

seberg commented Jun 26, 2020

Yeah, I think it is not a big problem to switch to HeapTypes... There are some gory details to figure out (creating HeapType classes from a metaclass is not something the Python C-API makes straight forward). So I thought this was much simpler for now. Just not sure we will have it done next week :).

We still have to make DTypeMeta public somewhere I think, but that should be fine, just need to decide where.

@mattip
Copy link
Member

mattip commented Jun 26, 2020

Maybe we can rewrite the code in scikit-learn to avoid pickling dtype classes and avoid the problem?

@seberg
Copy link
Member

seberg commented Jun 27, 2020

That would be great, although in the mid-term, we have to fix it anyway, I think. Its weird not to be able to pickle a class.

@lesteve
Copy link
Contributor Author

lesteve commented Jul 1, 2020

Sorry I completely dropped the ball on this one ...

Maybe we can rewrite the code in scikit-learn to avoid pickling dtype classes and avoid the problem?

I was involved in writing these lines 4-5 years ago, I browsed the associated issues and I remember that they were needed to fix some hairy bugs in joblib. I created joblib/joblib#1080 to discuss whether it is possible to avoid relying on np.dtype('float64').__class__ to being picklable while still avoiding these hairy bugs.

That would be great, although in the mid-term, we have to fix it anyway, I think. Its weird not to be able to pickle a class.

My biased opinion as a downstream numpy consumer is that it would be great if this could be fixed in numpy. If I put my maintainer hat, I do feel the pain of needing to maintain something where you mix Python C-API, metaclass and gory details in the same sentence to make life easier for users 😉 ...

@ogrisel
Copy link
Contributor

ogrisel commented Jul 2, 2020

I have a workaround ready in joblib/joblib#1082 if numpy 1.20 ships without fixing the picklability of dtype classes.

@seberg
Copy link
Member

seberg commented Jul 2, 2020

Good to know. Honestly, pickling the class in this way seems a bit shady to begin with, and the reason why you are doing it is likely a bigger issue than whether or not the class can be pickled. So at least for joblib, we should rather make sure that unpickling gives you the singleton instance when appropriate.

@seberg
Copy link
Member

seberg commented Nov 12, 2020

@BvB93 we don't have any ambition putting the DType classes into np.typing, do we? Renaming them to np.typing.Float64, etc. would be the simplest solution. It would we mean should keep exposing them there for the foreseeable future though (even if we do change the main place where we put them).

@charris
Copy link
Member

charris commented Nov 24, 2020

I'm going to push this off as I doubt a fix will be available for 1.20.0.

@seberg
Copy link
Member

seberg commented Feb 4, 2021

I just realized that there is after all a simple solution to this. Sorry about this, we will have a fix in 1.20.1 I think, and that is currently planned to be released by next week.

seberg added a commit to seberg/numpy that referenced this issue Feb 4, 2021
Introducing the metaclass without a canonical top-level name broke
pickling of `type(np.dtype(...))` (which was always just `np.dtype`).

While a better solution will likely be possible by making the
DTypes HeapTypes and this solution may not work for all imaginable
cases (i.e. it is plausible for a dtype to not have a scalar type
associated), using a `copyreg` registration for the metaclass
surprisingly works without any issues and seems like the simplest
solution right now.

Closes numpygh-16692, numpygh-18325
@BvB93
Copy link
Member

BvB93 commented Feb 4, 2021

@seberg just to double check: #18332 will already resolve the issue, right?

@seberg
Copy link
Member

seberg commented Feb 4, 2021

@BvB93, yeah, thanks for looking. I had not thought of this approach/realized it works for static "builtin" types.

I am not certain it will solve all issues around this, since joblib (and similar) might do more than just pickle the class (for one try to actually instantiate it with some arguments ;)... outrageous, I know!).

@lesteve
Copy link
Contributor Author

lesteve commented Feb 5, 2021

Just a FYI in joblib, we decided to change slightly the way our hashing work to avoid the problem: joblib/joblib#1136.

Just trying to be more explicit: if the only complaint about this problem was from joblib then one option could be to not fix this issue. Reading the comments above, it seems that Dask may be affected by this from dask/dask#7170 (but not 100% clear yet)

charris pushed a commit to charris/numpy that referenced this issue Feb 6, 2021
Introducing the metaclass without a canonical top-level name broke
pickling of `type(np.dtype(...))` (which was always just `np.dtype`).

While a better solution will likely be possible by making the
DTypes HeapTypes and this solution may not work for all imaginable
cases (i.e. it is plausible for a dtype to not have a scalar type
associated), using a `copyreg` registration for the metaclass
surprisingly works without any issues and seems like the simplest
solution right now.

Closes numpygh-16692, numpygh-18325
hernot added a commit to hernot/hickle that referenced this issue Feb 17, 2021
Removed commits included in branch by imperfect branch rebasing
on assembling hickle-5-RC branch.

Numpy has introduced with version 1.20 a disrupting change which
prevents numpy.dtype type objects like np.int16 be pickled. Trying
to pickle numpy.dtypes causes pickling error in python 3.7 and python
3.8 which utilize numpy > 1.20. As this error will according to numpy
issue numpy/numpy#16692 not before numpy 1.21
is released. To fix for now set numpy requirement to be less than 1.20.
Recheck required when numpy 1.21 is released. If fixed update requirement
that all numpy 1.20 versions will be properly excluded from beeing
supported by hickle.
hernot added a commit to hernot/hickle that referenced this issue Feb 18, 2021
Removed commits included in branch by imperfect branch rebasing
on assembling hickle-5-RC branch.

Numpy has introduced with version 1.20 a disrupting change which
prevents numpy.dtype type objects like np.int16 be pickled. Trying
to pickle numpy.dtypes causes pickling error in python 3.7 and python
3.8 which utilize numpy > 1.20. As this error will according to numpy
issue numpy/numpy#16692 not before numpy 1.21
is released. To fix for now set numpy requirement to be less than 1.20.
Recheck required when numpy 1.21 is released. If fixed update requirement
that all numpy 1.20 versions will be properly excluded from beeing
supported by hickle.

Latest cryptography packages required by tox changed their dependency
upon rust compiler. This change makes installation of virtualenv fail
if not like for pip its latest versoin is used. Therefor virtualenv is
now included in pip --update line

Deviating from former plans missing indexer on regex match class in
python 3.5 fixed.

Latest HDF5 versions (>= 1.12) seem to not support anymor windows 32 bit
platform. This causes all appveyor builds on windows 32 bit to fail
exempt for python 3.5. It seems h5py 2.10 seems to be the last
supporting python 3.5.

Added TOXENV entry to appveyor yaml as tox_appveryor seems not anymore
limit environments to be tested to the selected testing environment.
hernot added a commit to hernot/hickle that referenced this issue Feb 19, 2021
Removed commits included in branch by imperfect branch rebasing
on assembling hickle-5-RC branch.

Numpy has introduced with version 1.20 a disrupting change which
prevents numpy.dtype type objects like np.int16 be pickled. Trying
to pickle numpy.dtypes causes pickling error in python 3.7 and python
3.8 which utilize numpy > 1.20. As this error will according to numpy
issue numpy/numpy#16692 not before numpy 1.21
is released. To fix for now set numpy requirement to be less than 1.20.
Recheck required when numpy 1.21 is released. If fixed update requirement
that all numpy 1.20 versions will be properly excluded from beeing
supported by hickle.

Latest cryptography packages required by tox changed their dependency
upon rust compiler. This change makes installation of virtualenv fail
if not like for pip its latest versoin is used. Therefor virtualenv is
now included in pip --update line

Deviating from former plans missing indexer on regex match class in
python 3.5 fixed.

Latest HDF5 versions (>= 1.12) seem to not support anymor windows 32 bit
platform. This causes all appveyor builds on windows 32 bit to fail
exempt for python 3.5. It seems h5py 2.10 seems to be the last
supporting python 3.5.

Added TOXENV entry to appveyor yaml as tox_appveryor seems not anymore
limit environments to be tested to the selected testing environment.
hernot added a commit to hernot/hickle that referenced this issue Mar 1, 2021
Removed commits included in branch by imperfect branch rebasing
on assembling hickle-5-RC branch.

Numpy has introduced with version 1.20 a disrupting change which
prevents numpy.dtype type objects like np.int16 be pickled. Trying
to pickle numpy.dtypes causes pickling error in python 3.7 and python
3.8 which utilize numpy > 1.20. As this error will according to numpy
issue numpy/numpy#16692 not before numpy 1.21
is released. To fix for now set numpy requirement to be less than 1.20.
Recheck required when numpy 1.21 is released. If fixed update requirement
that all numpy 1.20 versions will be properly excluded from beeing
supported by hickle.

Latest cryptography packages required by tox changed their dependency
upon rust compiler. This change makes installation of virtualenv fail
if not like for pip its latest versoin is used. Therefor virtualenv is
now included in pip --update line

Deviating from former plans missing indexer on regex match class in
python 3.5 fixed.

Latest HDF5 versions (>= 1.12) seem to not support anymor windows 32 bit
platform. This causes all appveyor builds on windows 32 bit to fail
exempt for python 3.5. It seems h5py 2.10 seems to be the last
supporting python 3.5.

Added TOXENV entry to appveyor yaml as tox_appveryor seems not anymore
limit environments to be tested to the selected testing environment.
hernot added a commit to hernot/hickle that referenced this issue Mar 2, 2021
Removed commits included in branch by imperfect branch rebasing
on assembling hickle-5-RC branch.

Numpy has introduced with version 1.20 a disrupting change which
prevents numpy.dtype type objects like np.int16 be pickled. Trying
to pickle numpy.dtypes causes pickling error in python 3.7 and python
3.8 which utilize numpy > 1.20. As this error will according to numpy
issue numpy/numpy#16692 not before numpy 1.21
is released. To fix for now set numpy requirement to be less than 1.20.
Recheck required when numpy 1.21 is released. If fixed update requirement
that all numpy 1.20 versions will be properly excluded from beeing
supported by hickle.

Latest cryptography packages required by tox changed their dependency
upon rust compiler. This change makes installation of virtualenv fail
if not like for pip its latest versoin is used. Therefor virtualenv is
now included in pip --update line

Deviating from former plans missing indexer on regex match class in
python 3.5 fixed.

Latest HDF5 versions (>= 1.12) seem to not support anymor windows 32 bit
platform. This causes all appveyor builds on windows 32 bit to fail
exempt for python 3.5. It seems h5py 2.10 seems to be the last
supporting python 3.5.

Added TOXENV entry to appveyor yaml as tox_appveryor seems not anymore
limit environments to be tested to the selected testing environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants