New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to cloudpickle
Pydantic model classes
#6763
Comments
The root cause seems to be that the |
My code example uses a version of # bug.py
"""Cloudpickling Pydantic models raises an exception."""
from pydantic import BaseModel
import cloudpickle
class SimpleModel(BaseModel):
val: int
cloudpickle.dumps(SimpleModel)
"""
Output:
% python bug.py
Traceback (most recent call last):
File "/Users/shrekris/Desktop/scratch/dump4.py", line 18, in <module>
cloudpickle.dumps(SimpleModel)
File "/Users/shrekris/miniforge3/envs/pydantic-fix/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/Users/shrekris/miniforge3/envs/pydantic-fix/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle 'pydantic_core._pydantic_core.SchemaSerializer' object
""" |
@shrekris-anyscale Thank you for the report. Any ideas what could make it work? |
Is there a dunder method that cloudpickle uses? We have custom |
One solution is to store Essentially, adding the following methods to def __init__(self, schema: CoreSchema, config: CoreConfig):
self.schema = schema
self.config = config
def __reduce__(self):
return SchemaSerializer, (self.schema, self.config, ) Here's a gist where I overwrote the |
If you’d like to make a PR adding this on pydantic-core I’m sure we’d accept it, otherwise @davidhewitt maybe you could whip something up? |
I don't think |
I agree, but I don't think this plays nicely with I'm not sure how we'd be able to just store the schema and recreate the serializer if we can't use a custom |
I did some digging and it looks like So in the repro discussed above, the class is On the other hand, if So @shrekris-anyscale a possible workaround may be to move your model definitions out of |
Thanks for investigating! We use
When |
The downside of this is that I'd expect we'd need to take a deep copy of the schema and config and store them inside the This would probably also be the case for It seems like a lot of extra CPU and memory baggage that most users will not need. The zero-cost solution would be to implement the "inverse operation" described above (i.e. rebuild schema and config from the Rust state). I couldn't guarantee that's even possible, though. A proposal which allows pickle support to be opt-in may be a more feasible solution for now. |
One low-baggage alternative is to delete the serializer upon serialization and reconstruct it whenever it's first called. The def __reduce__(self):
return lambda: None, tuple() Then whenever the This should only affect users that are serializing the |
Another option is to have the |
Does this work with monkey-patching from pydantic_core import SchemaSerializer
def __reduce__(self):
return lambda: None, tuple()
SchemaSerializer.__reduce__ = __reduce__ (maybe you'd need to configure it to bind properly, but I'm not sure if PyO3 will cause other issues) |
I haven't actually run that example myself, so I'm not sure if the syntax is correct. The high level approach of that |
Hello,
This is with
I am also interested in any way that could make this work ;) |
A bit more on my use case: To remove the offending (class) fields I can surround the (cloud)pickling with this context manager which works fine: @contextlib.contextmanager
def picklable_pydantic_model(cls: tp.Type[pydantic.BaseModel]) -> tp.Iterator[None]:
name = f"{cls.__module__}.{cls.__qualname__}"
if "<locals>." not in name and not name.startswith("__main__."):
# the class is defined in a module, nothing to do :)
yield
return
logger.warning(
f"Hacking {name} fields to enable pickling as it is locally defined, "
"please move to a module asap for more robustness"
)
content: tp.Dict[str, tp.Any] = {
"__pydantic_parent_namespace__": None,
"__pydantic_serializer__": None,
}
for name in content:
content[name] = cls.__dict__[name]
type.__setattr__(cls, name, None)
try:
yield
finally:
for name, data in content.items():
type.__setattr__(cls, name, data) It's applied on the class, and allows both class and instances to be pickled. |
I also ran into the above issues:
I was able to enable
# Define a wrapper that saves the `schema` and `core_config` to reconstruct the native `SchemaSerializer`.
class CloudpickleableSchemaSerializer:
def __init__(self, schema, core_config):
self._schema = schema
self._core_config = core_config
self._schema_serializer = SchemaSerializer(self._schema, self._core_config)
def __reduce__(self):
return CloudpickleableSchemaSerializer, (self._schema, self._core_config)
def __getattr__(self, attr: str):
return getattr(self._schema_serializer, attr)
# Override all usages of `SchemaSerializer` (obviously not needed if we upstream the above wrapper):
pydantic._internal._model_construction.SchemaSerializer = CloudpickleableSchemaSerializer
pydantic._internal._dataclasses.SchemaSerializer = CloudpickleableSchemaSerializer
pydantic.type_adapter.SchemaSerializer = CloudpickleableSchemaSerializer The class CloudpickleableSchemaSerializer(SchemaSerializer):
def __init__(self, schema, core_config):
self._schema = schema
self._core_config = core_config
# No need for `super().__init__()` because `SchemaSerializer` initialization happens in `__new__`.
def __reduce__(self):
return CloudpickleableSchemaSerializer, (self._schema, self._core_config)
class WeakRefWrapper:
def __init__(self, obj: Any):
if obj is None:
self._wr = None
else:
self._wr = weakref.ref(obj)
def __reduce__(self):
return WeakRefWrapper, (self(),)
def __call__(self) -> Any:
if self._wr is None:
return None
else:
return self._wr()
# Override all usages of `_PydanticWeakRef` (obviously not needed if we upstream the above wrapper):
pydantic._internal._model_construction._PydanticWeakRef = WeakRefWrapper AFAICT there's no downside to this wrapper but it gets around the strange ABC-related pickling error. @davidhewitt @dmontagu @lig I'm happy to contribute a patch if you think this is a reasonable direction. Let me know what you think. The only downside I can see is that the |
Thanks @edoakes for investigating. My question would be whether you plan to merge the If your goal is to merge this fix into pydantic then it's probably better to just make |
@davidhewitt I'd prefer to merge the functionality into Storing the members and defining So then the plan of action would be:
With these two, we should be good to go. Given that these changes would be split across the repos, is there any special versioning story between them? Or does |
Ah, I see it looks like the So I assume then that this will require first patching |
Yes exactly, we can release |
I have opened PRs for each of the above issues:
I will begin testing that these fixes are comprehensive using locally installed copies. Once these PRs have been merged and a new version of |
Both PRs linked above are now merged. I've begun manually testing and they appear to address the issue for all of my use cases. @jrapin if you could test out your workflow with @davidhewitt Please let me know when a |
@edoakes sorry for the delay |
Initial steps towards full `pydantic>=2.0` compatibility. Included: - Monkeypatch logic to make new pydantic models serializable. See pydantic/pydantic#6763 for the plan to properly fix this. - A new CI build running against Pydantic 2.0+. - Moved all internal usage of Pydantic models to `ray._private.pydantic_compat` to abstract out the `from pydantic` vs `from pydantic.v1` import paths. Future work: - Add tests for higher `fastapi` versions w/ Pydantic 2.0. - Add more comprehensive testing for Pydantic 2.0+ model serialization. - Add support for our custom-installed JSON encoders in Pydantic 2.0+ (or drop them because they are using internal APIs).
Thanks for the help @davidhewitt 🚀 do you know when the next release is scheduled and what its version tag will be? |
No specific ETA, I would assume 2.5. |
@davidhewitt any update on when the next |
Reading from #8028 there seems to be a couple of items still pending before 2.5 comes out |
I believe we are going to release 2.5 beta today, with a view to a final 2.5 production release next week. |
@edoakes unfortunately I have been fooled by my unit tests running better thanks to your fix ("local" classes defined within the scope of a unittest can be pickled while they could not before the fix), but I still have issues with Ipython console and Jupyter notebooks :( (see #8232). Did everything work fine on your end? Do you have any idea what can be causing this? |
Initial Checks
Description
cloudpickle
cannot serialize Pydantic model classes. It fails with aTypeError: cannot pickle 'pydantic_core._pydantic_core.SchemaSerializer' object
exception.Example Code
Python, Pydantic & OS Version
Selected Assignee: @lig
The text was updated successfully, but these errors were encountered: