Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong serializer picked on Union Field #6830

Open
1 task done
Tracked by #7946
kubasaw opened this issue Jul 24, 2023 · 9 comments
Open
1 task done
Tracked by #7946

Wrong serializer picked on Union Field #6830

kubasaw opened this issue Jul 24, 2023 · 9 comments
Assignees
Labels
bug V2 Bug related to Pydantic V2
Milestone

Comments

@kubasaw
Copy link

kubasaw commented Jul 24, 2023

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

I noticed some strange behavior of models in my application while migrating to pydantic@2.

In the case where a Union is included in the model, despite the correct validation of the input data, the wrong Union member is selected for serialization.

For attached code I got following output:

    CALLING UUID VALIDATOR
    CALLING UUID VALIDATOR
RAW->>  id=PrefixedUUID(root=UUID('8351471e-37f4-4faf-95ef-d14556a81bd8'))
    CALLING SPECIAL SERIALIZER
DUMP->>  {'id': UUID('8351471e-37f4-4faf-95ef-d14556a81bd8')}
    CALLING SPECIAL SERIALIZER
Traceback (most recent call last):
  File "/home/kuba/pydanticUnion/test.py", line 46, in <module>
    print("JSON->> ", test_model.model_dump_json())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kuba/pydanticUnion/venv/lib/python3.11/site-packages/pydantic/main.py", line 345, in model_dump_json
    return self.__pydantic_serializer__.to_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.PydanticSerializationError: Error serializing to JSON: PydanticSerializationError: Unable to serialize unknown type: UUID('8351471e-37f4-4faf-95ef-d14556a81bd8')

Minor question/issue: PrefixedUUID model_validator is called twice, but when I change test_data to JSON string and call Container.model_validate_json there is only single call to validator.

Example Code

from enum import StrEnum, auto

from pydantic import (UUID4, BaseModel, RootModel, model_serializer,
                      model_validator)


class SpecialValues(StrEnum):
    DEFAULT = auto()
    OTHER = auto()


class PrefixedUUID(RootModel):
    root: UUID4

    @model_validator(mode="before")
    def removePrefix(v: str):
        print("    CALLING UUID VALIDATOR")
        return v.removeprefix("uuid::")

    @model_serializer(mode="wrap")
    def appendPrefix(self, nxt):
        print("    CALLING UUID SERIALIZER")
        return f"uuid::{nxt(self)}"


class Special(RootModel):
    root: SpecialValues

    @model_serializer(mode="wrap")
    def test(self, nxt):
        print("    CALLING SPECIAL SERIALIZER")
        return nxt(self)


class Container(BaseModel):
    id: Special | PrefixedUUID


test_data = {"id": "uuid::8351471e-37f4-4faf-95ef-d14556a81bd8"}

test_model = Container.model_validate(test_data)

print("RAW->> ", test_model)
print("DUMP->> ", test_model.model_dump())
print("JSON->> ", test_model.model_dump_json())

Python, Pydantic & OS Version

pydantic version: 2.0.3
        pydantic-core version: 2.3.1
          pydantic-core build: profile=release pgo=true mimalloc=true
                 install path: /home/kuba/pydanticUnion/venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.3 (main, Apr  5 2023, 14:15:06) [GCC 9.4.0]
                     platform: Linux-5.15.0-75-generic-x86_64-with-glibc2.31
     optional deps. installed: ['typing-extensions']

Selected Assignee: @davidhewitt

@kubasaw kubasaw added bug V2 Bug related to Pydantic V2 unconfirmed Bug not yet confirmed as valid/applicable labels Jul 24, 2023
@davidhewitt
Copy link
Contributor

It looks like the crash is fixed on pydantic 2.1.1, please upgrade.

Minor question/issue: PrefixedUUID model_validator is called twice, but when I change test_data to JSON string and call Container.model_validate_json there is only single call to validator.

I get Container.model_validate_json calling the validator twice also. This is expected at the moment by our union implementation as we try to work out the "best" type for the result. I think there's an improvement to be made here; I'm reworking the union validation at the moment in pydantic/pydantic-core#867 so in this case we might be able to reduce to a single call.

@kubasaw
Copy link
Author

kubasaw commented Aug 20, 2023

I have got the same output for:

             pydantic version: 2.2.1
        pydantic-core version: 2.6.1
          pydantic-core build: profile=release pgo=true
                 install path: /home/kuba/pydanticUnion/venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.4 (main, Jun  7 2023, 12:45:49) [GCC 9.4.0]
                     platform: Linux-5.15.0-79-generic-x86_64-with-glibc2.31
     optional deps. installed: ['typing-extensions']

@davidhewitt
Copy link
Contributor

I do not get the crash on Pydantic 2.2.1. I could reproduce it with the older 2.0.3.

The validator being called twice is still expected and may be fixed in Pydantic 2.3 with pydantic/pydantic-core#867

@kubasaw
Copy link
Author

kubasaw commented Aug 23, 2023

You are right. For 2.0.3 and earlier, there is a crash

    CALLING UUID VALIDATOR
    CALLING UUID VALIDATOR
RAW->>  id=PrefixedUUID(root=UUID('8351471e-37f4-4faf-95ef-d14556a81bd8'))
    CALLING SPECIAL SERIALIZER
DUMP->>  {'id': UUID('8351471e-37f4-4faf-95ef-d14556a81bd8')}
    CALLING SPECIAL SERIALIZER
Traceback (most recent call last):
  File "/home/kuba/pydanticUnion/test.py", line 45, in <module>
    print("JSON->> ", test_model.model_dump_json())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kuba/pydanticUnion/venv/lib/python3.11/site-packages/pydantic/main.py", line 336, in model_dump_json
    return self.__pydantic_serializer__.to_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.PydanticSerializationError: Error serializing to JSON: PydanticSerializationError: Unable to serialize unknown type: UUID('8351471e-37f4-4faf-95ef-d14556a81bd8')

For 2.1.0 and newer, this piece of code works (exit without error), but in both cases, the wrong validator is selected for serialization:

    CALLING UUID VALIDATOR
    CALLING UUID VALIDATOR
RAW->>  id=PrefixedUUID(root=UUID('8351471e-37f4-4faf-95ef-d14556a81bd8'))
    CALLING SPECIAL SERIALIZER
DUMP->>  {'id': UUID('8351471e-37f4-4faf-95ef-d14556a81bd8')}
    CALLING SPECIAL SERIALIZER
JSON->>  {"id":"8351471e-37f4-4faf-95ef-d14556a81bd8"}

Since input is validated as a PrefixedUUID class, I would expect that the logic defined for this class will be called during serialization. However, serialization is called for Special class, which has nothing to do with the validated model.

I suppose that the serialization is done the same way as validation: serializers are called in the order of classes in the Union. When the model is successfully serialized using another union member's serializer, this value is returned. I don't think it makes sense :/

@davidhewitt
Copy link
Contributor

Right, you are correct, I see that now, that's definitely another bug. 👍

@cmueller-bbc
Copy link

I've been wrestling with this issue and just found this bug report. I think I have two more examples for you to test against.

Like @kubasaw, my main use case is using Pydantic with "third party" classes. While debugging my code, I ended up with some simple variations on the code from Types that exhibit the same behavior.

Annotated Example

This example simple uses int and float along with Annotated to create the custom types used in the Model.value Union:

from typing import Annotated, Any, Callable, Literal, Optional, TypeAlias
from pydantic import BaseModel, ConfigDict, BeforeValidator, PlainSerializer, WithJsonSchema

def print_type(t):
    def validate(x):
        assert t == type(x)
        print(f"{t} == {type(x)}")
        return x
    return validate

IntAlias = Annotated[
    int,
    BeforeValidator(print_type(int)),
    PlainSerializer(lambda x: f"{type(x)} {x * 10}"),
    WithJsonSchema({'type': 'integer'})
]

FloatAlias = Annotated[
    float,
    BeforeValidator(print_type(float)),
    PlainSerializer(lambda x: f"{type(x)} {x * 100.0}"),
    WithJsonSchema({'type': 'number'})
]

class Model(BaseModel):
    model_config = ConfigDict(
        strict=True,
    )

    # This will validate properly, but pick the first element in the Union as the serializer
    value:  IntAlias | FloatAlias

m_int = Model(value = 1)     # <class 'int'> == <class 'int'>
m_float = Model(value = 2.0) # <class 'float'> == <class 'float'>
print(m_int.model_dump())    # {'value': "<class 'int'> 10"}
print(m_float.model_dump())  # {'value': "<class 'float'> 20.0"}  <-- this should be 200.0

The comments on the last few lines show the output. Pydantic seems to get the correct types but wrong serializer. Playing around, I found that the serializer always comes from the first type in the list.

ThirdPartyType Example

I had hoped that the bug was limited to Annotated types and that using __get_pydantic_core_schema__ would solve the problem, but it seems that it might be directly related to how Union types are handled.

This example is based on the ThirdPartyType example:

from typing import Any

from pydantic_core import core_schema
from typing_extensions import Annotated

from pydantic import (
    BaseModel,
    ConfigDict,
    GetCoreSchemaHandler,
    GetJsonSchemaHandler,
    ValidationError,
)
from pydantic.json_schema import JsonSchemaValue


# ------------------------------------------------------------
# Basic Type
# ------------------------------------------------------------

class ThirdPartyType:
    x: int

    def __init__(self):
        self.x = 0


class _ThirdPartyTypePydanticAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:

        def validate_from_dict(value: dict) -> ThirdPartyType:
            assert "x" in value, f'x not in {value.keys()}'
            result = ThirdPartyType()
            result.x = value["x"]
            return result

        from_dict_schema = core_schema.chain_schema(
            [
                core_schema.dict_schema(),
                core_schema.no_info_plain_validator_function(validate_from_dict),
            ]
        )

        return core_schema.json_or_python_schema(
            json_schema=from_dict_schema,
            python_schema=core_schema.union_schema(
                [
                    core_schema.is_instance_schema(ThirdPartyType),
                    from_dict_schema,
                ]
            ),
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda instance: {"x": instance.x}
            ),
        )

    @classmethod
    def __get_pydantic_json_schema__(
        cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
    ) -> JsonSchemaValue:
        return handler(core_schema.dict_schema())

PydanticThirdPartyType = Annotated[
    ThirdPartyType, _ThirdPartyTypePydanticAnnotation
]



# ------------------------------------------------------------
# Int Type
# ------------------------------------------------------------

class ThirdPartyIntType:
    x: int

    def __init__(self):
        self.x = 0


class _ThirdPartyIntTypePydanticAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:

        def validate_from_int(value: int) -> ThirdPartyIntType:
            result = ThirdPartyIntType()
            result.x = value
            return result

        from_int_schema = core_schema.chain_schema(
            [
                core_schema.int_schema(),
                core_schema.no_info_plain_validator_function(validate_from_int),
            ]
        )

        return core_schema.json_or_python_schema(
            json_schema=from_int_schema,
            python_schema=core_schema.union_schema(
                [
                    core_schema.is_instance_schema(ThirdPartyIntType),
                    from_int_schema,
                ]
            ),
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda instance: instance.x * 10
            ),
        )

    @classmethod
    def __get_pydantic_json_schema__(
        cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
    ) -> JsonSchemaValue:
        return handler(core_schema.int_schema())


PydanticThirdPartyIntType = Annotated[
    ThirdPartyIntType, _ThirdPartyIntTypePydanticAnnotation
]


# ------------------------------------------------------------
# Float Type
# ------------------------------------------------------------

class ThirdPartyFloatType:
    x: float

    def __init__(self):
        self.x = 0


class _ThirdPartyFloatTypePydanticAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:

        def validate_from_float(value: float) -> ThirdPartyFloatType:
            result = ThirdPartyFloatType()
            result.x = value
            return result

        from_float_schema = core_schema.chain_schema(
            [
                core_schema.float_schema(),
                core_schema.no_info_plain_validator_function(validate_from_float),
            ]
        )

        return core_schema.json_or_python_schema(
            json_schema=from_float_schema,
            python_schema=core_schema.union_schema(
                [
                    core_schema.is_instance_schema(ThirdPartyFloatType),
                    from_float_schema,
                ]
            ),
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda instance: instance.x * 100
            ),
        )

    @classmethod
    def __get_pydantic_json_schema__(
        cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
    ) -> JsonSchemaValue:
        return handler(core_schema.float_schema())


PydanticThirdPartyFloatType = Annotated[
    ThirdPartyFloatType, _ThirdPartyFloatTypePydanticAnnotation
]


class Model(BaseModel):
    model_config = ConfigDict(
        strict=True,
    )

    third_party_type: PydanticThirdPartyFloatType | PydanticThirdPartyIntType | PydanticThirdPartyType

m_float = Model(third_party_type=2.0)
assert isinstance(m_float.third_party_type, ThirdPartyFloatType)
assert m_float.third_party_type.x == 2.0
print("1)", m_float.model_dump())  # 1) {'third_party_type': 200.0}  <-- should be 20.0
# assert m_float.model_dump() == {'third_party_type': 2.0 * 10}

m_int = Model(third_party_type=1)
assert isinstance(m_int.third_party_type, ThirdPartyIntType)
assert m_int.third_party_type.x == 1
print("2)", m_int.model_dump())
assert m_int.model_dump() == {'third_party_type': 1 * 100}

m_dict = Model(third_party_type = {"x": 3})
assert isinstance(m_dict.third_party_type, ThirdPartyType)
assert m_dict.third_party_type.x == 3
print("3)", m_dict.model_dump())  # 3) {'third_party_type': 300} <-- should be {'x': 3}
# assert m_dict.model_dump() == {'x': 3}

The commented assert statements in the last few sections fail.

Note that the order in the Union is:

third_party_type: PydanticThirdPartyFloatType | PydanticThirdPartyIntType | PydanticThirdPartyType

The serialized values all match the PydanticThirdPartyFloatType serializer, which multiples them by 100.

Workaround?

As a workaround, I implemented a general purpose serializer that checks types and serializes accordingly. Using the simple int/float example from above, here's the basic idea:

def external_serializer(x):
    if type(x) is int:
        return f"{type(x)} {x * 10}"
    elif type(x) is float:
        return f"{type(x)} {x * 100.0}"
    else:
        assert False, f"Unknown type: {type(x)}"

IntAlias2 = Annotated[
    int,
    BeforeValidator(print_type(int)),
    PlainSerializer(external_serializer),
    WithJsonSchema({'type': 'integer'})
]

FloatAlias2 = Annotated[
    float,
    BeforeValidator(print_type(float)),
    PlainSerializer(external_serializer),
    WithJsonSchema({'type': 'number'})
]

class Model2(BaseModel):
    model_config = ConfigDict(
        strict=True,
    )

    # This will validate properly, but pick the first element in the Union as the serializer
    value:  IntAlias2 | FloatAlias2

m_int = Model2(value = 1)     # <class 'int'> == <class 'int'>     
m_float = Model2(value = 2.0) # <class 'float'> == <class 'float'> 
print(m_int.model_dump())     # {'value': "<class 'int'> 10"}      
print(m_float.model_dump())   # {'value': "<class 'float'> 200.0"  

Does anyone see any issues with this approach?

In my actual application, I'm also working with a list of types that can include Model, e.g.:

value_list = List[ IntAlias | FloatAlias | Model ]

Are there any special considerations I should take into account to properly validate BaseModel derived types that are included in the Union?

Hopefully the above examples help in resolving this issue.

@ouatu-ro
Copy link

There is a strange case, if the first type of the union is a python type, eg. str, it works properly, as in the following example:

class Language(BaseModel):
    model_config = ConfigDict(validate_assignment=True)

    language_code: str
    
    @model_serializer(when_used='json')
    def to_dict(self) -> str:
        return self.language_code+'-serialized'

class Response(BaseModel):
    # languages: List[Language|str]
    language: str|Language ###### notice the order
    class Config:
        validate_assignment = True

    @field_validator("language", mode='before')
    @classmethod
    def validate_source_language(cls, value: str):  # value = "en"
        if value == "auto":
            return value
        # If not 'auto', attempt to validate and return a Language instance
        elif isinstance(value, str):
            return Language(language_code=value)
        raise ValueError('source_language must be either "auto" or a Language instance')

new_data = {'language': 'auto'}
new_response = Response(**new_data)
print(new_response.model_dump_json())
"""
'{"language":"auto"}'
"""
new_response.language = "en"
print(new_response.model_dump_json())
"""
'{"language":"en-serialized"}'
"""

but

from typing import Any, List, Union

from pydantic import (
    BaseModel,
    ConfigDict,
    Field,
    PlainValidator,
    ValidationInfo,
    ValidatorFunctionWrapHandler,
    field_validator,
    model_serializer,
    model_validator,
    root_validator,
    validator,
)
from pydantic.functional_serializers import PlainSerializer
from pydantic.functional_validators import (
    AfterValidator,
    BeforeValidator,
    WrapValidator,
)
from typing_extensions import Annotated


class Language(BaseModel):
    model_config = ConfigDict(validate_assignment=True)

    language_code: str
    
    @model_serializer(when_used='json')
    def to_dict(self) -> str:
        return self.language_code+'-serialized'

class Response(BaseModel):
    # languages: List[Language|str]
    language: Language|str ########### inverted order
    class Config:
        validate_assignment = True

    @field_validator("language", mode='before')
    @classmethod
    def validate_source_language(cls, value: str):  # value = "en"
        if value == "auto":
            return value
        # If not 'auto', attempt to validate and return a Language instance
        elif isinstance(value, str):
            return Language(language_code=value)
        raise ValueError('source_language must be either "auto" or a Language instance')

new_data = {'language': 'auto'}
new_response = Response(**new_data)
print(new_response.model_dump_json())
"""
PydanticSerializationError: Error serializing to JSON: PydanticSerializationError: Error calling function `to_dict`: AttributeError: 'str' object has no attribute 'language_code'
"""
new_response.language = "en"
new_response.model_dump_json()
"""
'{"language":"en-serialized"}'
"""

@ouatu-ro
Copy link

ouatu-ro commented Mar 21, 2024

Another related issue, not sure if to open a new issue for this one:

When using @model_serializer with inheritance, the child class's custom serializer method does not seem to be respected during serialization. Instead, the parent class's serializer method is called. This behavior is contrary to expectations where the child's serializer method should override the parent's.

from pydantic import BaseModel, ConfigDict, model_serializer

class LanguageBase(BaseModel):
    model_config = ConfigDict(validate_assignment=True)
    language_code: str

    @model_serializer(when_used='json')
    def to_dict(self) -> str:
        return self.language_code + '-serialized'

class Language(LanguageBase):
    @model_serializer(when_used='json')
    def to_dict_child(self) -> str:
        return self.language_code + '-serialized-child'

class Response(BaseModel):
    language: str | LanguageBase

    class Config:
        validate_assignment = True

# Sample data for creating a Response instance
new_data = {'language': Language(language_code='en')}
new_response = Response(**new_data)

# Attempt to serialize
print(new_response.json())

Expected Behavior

The output of new_response.json() should be '{"language":"en-serialized-child"}', indicating that the child class's (Language) serializer method to_dict_child is used for serialization.
Actual Behavior

The actual output is '{"language":"en-serialized"}'. This suggests that the LanguageBase class's serializer method to_dict is called instead of the Language class's to_dict_child.

This issue is significant as it affects the polymorphic behavior of models in applications leveraging Pydantic for complex data handling and serialization. Correctly respecting method overrides in class hierarchies is fundamental for object-oriented design, enabling more flexible and clear implementations. It offers a "quack like a duck" alternative to generics in some scenarios, simplifying numerous use cases where specialized behavior is needed for derived model instances without compromising the simplicity and readability of the code.

Though, if my usage of Pydantic is not optimal and there's a recommended approach for these kinds of transformations, I'd greatly appreciate any guidance or examples of how to properly implement this functionality.

system info:

pydantic version: 2.6.4
        pydantic-core version: 2.16.3
          pydantic-core build: profile=release pgo=false
               python version: 3.12.2 (main, Mar 19 2024, 02:13:01) [Clang 15.0.0 (clang-1500.1.0.2.5)]
                     platform: macOS-13.6.4-x86_64-i386-64bit
             related packages: fastapi-0.110.0 typing_extensions-4.10.0
                       commit: unknown

@ornariece
Copy link
Contributor

@ouatu-ro i think the above is a concern fit for #9063. at the very least, it's unclear how pydantic "picks" the model_serializer for the current model. optimally, i argue that it should support multiple model_serializers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2
Projects
None yet
Development

No branches or pull requests

6 participants