New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic __eq__
and __hash__
doesn't work properly with functools.cached_property
#7444
Comments
@QuentinSoubeyranAqemia thank you for the bug report. Looking at your examples this will produce equal entities: obj1 = Model(attr=1)
obj2 = Model(attr=1)
assert obj1 == obj2 # works
obj1.cached
obj2.cached # adding this makes the next check pass
assert obj1 == obj2 I think that there could be cases where objects with a cached property being already calculated and not shouldn't be considered equal. In general, we don't know what the result of calculating a cached property would be on another instance. Also, it's not clear to me that Pydantic is allowed calling that calculation implicitly in the I would argue that this logic is implementation specific and would recommend implementing a custom base class for your models implementing desired logic. |
I'd be happy to consider changing it, the question is whether this could represent a breaking change? The reason we used |
Thank you for your answers! I have a few things to reply to, so I splitted my message by what part is refered to
That make sense, thank you for the context !
Technically it is a breaking change, I suppose. In practice, this depends on how often the assumption "the |
While this indeed work on the minimal-repro, I don't think it is applicable in production. For instance, to use a dict mapping model instances to some values, you would need to:
It would be more efficient to use a simple
Do you have a concrete example where a cached property being computed or not should change the equality comparison, all other fields being equal? It seems to me that if the property can be meaningfully cached, by definition it shouldn't change, it's like a lazy constant attribute? When I'm using
I agree that
Indeed, using a custom base model with a custom |
I think this is an excellent point. Put differently, if
How about replacing: self.__dict__ == other.__dict__ with: (
self.__dict__ == other.__dict__ or (
self.__dict__.keys() != other.__dict__.keys() and
(slower check using model_fields or something)
)
) This should be:
Moving the whole check to after checking |
Ah, but you'd also need to update |
Good catch, modifying
It similarly should only use pydantic's fields. Looks like the key performance bottleneck will be to efficiently take a subset of a dict? Is there any performance benchmark for pydantic that could be used to experiment with that? |
I opened #7786 to show that hashing can be handled without hurting performance too much. |
The problem also affects This is because Should I open a new issue with this problem ? import functools
import random
import pydantic
class Model(pydantic.BaseModel):
model_config = pydantic.ConfigDict(frozen=True)
attr: int
@functools.cached_property
def prop(self) -> int:
return random.randint(0, 999)
obj1 = Model(attr=1)
print(hash(obj1))
mapping = {obj1: True}
mapping[obj1]
obj1.prop
print(hash(obj1))
mapping[obj1] # KeyError |
BaseModel.__eq__
doesn't work on models with functools.cached_property
functools.cached_property
functools.cached_property
__eq__
and __hash__
doesn't work properly with functools.cached_property
I think it's essentially the same problem, it just needs to be fixed in two places. The original "Example 2" already covered this. |
I don't think example 2 actually covers it. This is dependent on the internals of
Given that all objects with the same hash ends up in the same bucket, I don't think the comparison is done first on hash, then with |
I personally think it makes sense to just replace Even if it's not quite as fast, I agree with the claims made here that |
Thanks @dmontagu for the vote, it felt like there was still some doubt before but now it sounds like we should go forward with a change. @QuentinSoubeyranAqemia you've been amazingly thorough and diligent with this stuff, can you open a PR for |
I'll take some time to experiment with |
Well, apparently there is an edge case where the deprecated |
@alexmojaki I've reviewed both, they basically seem good — thank you both for the detailed performance investigations and nuanced, correct(🤞) implementations. I had note on #7786 that doesn't necessarily need to be addressed before merging (my most recent comment), and I think that can be merged, and then I think #7825 can be merged once @alexmojaki's comments on the PR are addressed. I'll leave it to @sydney-runkle to decide exactly when to merge, I think the plan was docs only changes prior to 2.5.0, but given it's a bugfix maybe it can be in a quick 2.5.1. |
🎆 🎉 💃 |
Initial Checks
Description
Bug
While porting an app to Pydantic v2, I notice that the implementation of
__eq__
doesn't work properly on Model withfunctools.cached_property
: equal instances can compare unequal.From the documentation on computed fields, it seems pydantic should support
functools.cached_property
. I'm trying to compute an attribute value lazily and only once, because it is very expensive.functools.cached_property
seems like the perfect tool for that.Example 1 below demonstrate the problem, and Example 2 shows that this can introduce subtle bugs when using dict on "equal" objects, as dict relies on a correct
__eq__
and__hash__
.Analysis
Line 831 from pydantic's implementation of
__eq__
seems to assumes that__dict__
only ever contains pydantic's field values.pydantic/pydantic/main.py
Lines 829 to 834 in c6e2f89
I think this is not true in the general case, as
__dict__
is an instance state that is shared between pydantic and all other mechanism that need to act on the instance itself.Indeed,
functools.cached_property
stores the computed value in the instance__dict__
, because it is the only place it can reasonably store it:https://github.com/python/cpython/blob/1f885df2a580360c5de69cc41191f3c6bfaaeb35/Lib/functools.py#L1005
Assuming that
__dict__
only contains pydantic's own fields has already been mentionned in #6422Possible fix
I think the fix below limits the comparison of
__dict__
to pydantic's own fields. I don't know much about pydantic source code, so there might be some rough edges I overlooked. I used a sentinel object as a default value to avoid crashing on missing fields, not sure it is actually necessary.Related issues
model_dump()
showsPydantic serializer warnings
after call@cached_property
method #6422 had a more obscure problem due to assuming__dict__
contains only pydantic fieldsAdditional context
Comparing the full
__dict__
seems to be a change from pydantic V1. I would be interested in knowing why the full__dict__
is compared in V2.Both
attrs
and the stdlibdataclasses
work seamlessly withfunctools.cahced_property
and only compare their own fields:dataclasses
__eq__
https://github.com/python/cpython/blob/1f885df2a580360c5de69cc41191f3c6bfaaeb35/Lib/dataclasses.py#L1086-L1099
attrs
__eq__
https://www.attrs.org/en/stable/comparison.html#comparison
Example Code
Python, Pydantic & OS Version
The text was updated successfully, but these errors were encountered: