Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only hash model_fields, not whole __dict__ #7786

Merged
merged 6 commits into from Nov 13, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 11 additions & 9 deletions pydantic/_internal/_model_construction.py
@@ -1,6 +1,7 @@
"""Private logic for creating models."""
from __future__ import annotations as _annotations

import operator
import typing
import warnings
import weakref
Expand Down Expand Up @@ -123,9 +124,6 @@ def wrapped_model_post_init(self: BaseModel, __context: Any) -> None:
namespace['__class_vars__'] = class_vars
namespace['__private_attributes__'] = {**base_private_attributes, **private_attributes}

if config_wrapper.frozen:
set_default_hash_func(namespace, bases)

cls: type[BaseModel] = super().__new__(mcs, cls_name, bases, namespace, **kwargs) # type: ignore

from ..main import BaseModel
Expand Down Expand Up @@ -181,6 +179,10 @@ def wrapped_model_post_init(self: BaseModel, __context: Any) -> None:

types_namespace = get_cls_types_namespace(cls, parent_namespace)
set_model_fields(cls, bases, config_wrapper, types_namespace)

if config_wrapper.frozen and '__hash__' not in namespace:
set_default_hash_func(cls, bases)

complete_model_class(
cls,
cls_name,
Expand Down Expand Up @@ -388,19 +390,19 @@ def inspect_namespace( # noqa C901
return private_attributes


def set_default_hash_func(namespace: dict[str, Any], bases: tuple[type[Any], ...]) -> None:
if '__hash__' in namespace:
return

def set_default_hash_func(cls: type[BaseModel], bases: tuple[type[Any], ...]) -> None:
base_hash_func = get_attribute_from_bases(bases, '__hash__')
if base_hash_func in {None, object.__hash__}:
# If `__hash__` is None _or_ `object.__hash__`, we generate a hash function.
# It will be `None` if not overridden from BaseModel, but may be `object.__hash__` if there is another
# parent class earlier in the bases which doesn't override `__hash__` (e.g. `typing.Generic`).

getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

def hash_func(self: Any) -> int:
return hash(self.__class__) + hash(tuple(self.__dict__.values()))
return hash(self.__class__) + hash(getter(self.__dict__))
Copy link
Contributor

@dmontagu dmontagu Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than just doing

        def hash_func(self: Any) -> int:
            return hash(self.__class__) + hash(tuple(self.__dict__.get(k) for k in cls.model_fields))

I'd prefer to avoid adding the operator import if possible (as we have generally been trying to reduce the number of imports pydantic triggers, which increase startup time), but if it's noticeably faster then maybe it makes sense. I think if you can't see a significant and repeatable difference, better to not use it.

Unless I've misunderstood and the getter thing is doing something different from my suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh I see there was a whole conversation about this and confirmation that it is significantly faster. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should confirm with @samuelcolvin though given the point about wanting to reduce import times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you propose using .get() instead of []. Do you know whether it is safe to use itemgetter(), which doesn't have a default and thus will raise an error if the key is not found?

Said differently, are there cases where some keys from cls.model_fields are not present on the instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Said differently, are there cases where some keys from cls.model_fields are not present on the instance?

I tried to find ways that this might happen, which led me to #7784. I couldn't think of anything else but I don't actually know pydantic that deeply so I could easily be missing something. But my guess is that if it's possible then it's enough of a red flag that raising an error here would actually be a good thing.


namespace['__hash__'] = hash_func
cls.__hash__ = hash_func # type: ignore


def set_model_fields(
Expand Down