Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BaseModel.__hash__ doesn't match __eq__ #7785

Closed
1 task done
alexmojaki opened this issue Oct 9, 2023 · 0 comments · Fixed by #7786
Closed
1 task done

BaseModel.__hash__ doesn't match __eq__ #7785

alexmojaki opened this issue Oct 9, 2023 · 0 comments · Fixed by #7786
Labels
bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation

Comments

@alexmojaki
Copy link
Contributor

alexmojaki commented Oct 9, 2023

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

This special handling for generic classes:

pydantic/pydantic/main.py

Lines 855 to 864 in dbbd776

def __eq__(self, other: Any) -> bool:
if isinstance(other, BaseModel):
# When comparing instances of generic types for equality, as long as all field values are equal,
# only require their generic origin types to be equal, rather than exact type equality.
# This prevents headaches like MyGeneric(x=1) != MyGeneric[Any](x=1).
self_type = self.__pydantic_generic_metadata__['origin'] or self.__class__
other_type = other.__pydantic_generic_metadata__['origin'] or other.__class__
return (
self_type == other_type

has no equivalent for __hash__:

return hash(self.__class__) + hash(tuple(self.__dict__.values()))

This means that a == b doesn't imply hash(a) == hash(b), breaking how dicts and sets work.

It might also be worth noting that __eq__ looks at __pydantic_private__ and __pydantic_extra__ while __hash__ doesn't. This isn't a contract violation in the same way, since non-equal instances are allowed to have equal hashes, but it makes hash collisions more likely. Hypothetically you could have a large set/dict of model instances where all the public fields are the same (so all the hashes are equal) but the private attributes differ (so the instances are non-equal) and then operations which are usually O(1)ish become O(n). On the other hand, adding more logic to __hash__ would of course reduce performance slightly in the vast majority of cases, so it's not obvious what to do.

EDIT: there's a good reason not to hash private attributes: #7800 (comment)

Example Code

from typing import TypeVar, Generic

from pydantic import BaseModel

T = TypeVar("T")


class A(BaseModel, Generic[T], frozen=True):
    a: T


a1 = A[int](a=1)
a2 = A(a=1)
assert a1 == a2
assert hash(a1) != hash(a2)
assert a1 not in {a2}
assert a2 not in {a1}

Python, Pydantic & OS Version

pydantic version: 2.4.2
        pydantic-core version: 2.10.1
          pydantic-core build: profile=release pgo=true
                 install path: /home/alex/work/pydantic/pydantic
               python version: 3.11.5 (main, Sep  9 2023, 21:35:25) [GCC 7.5.0]
                     platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
             related packages: typing_extensions-4.7.1 email-validator-2.0.0.post2 pyright-1.1.330.post0 mypy-1.1.1 pydantic-extra-types-2.1.0 pydantic-settings-2.0.3
@alexmojaki alexmojaki added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant