-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong serializer picked on Union Field #6830
Comments
It looks like the crash is fixed on pydantic 2.1.1, please upgrade.
I get |
I have got the same output for:
|
I do not get the crash on Pydantic 2.2.1. I could reproduce it with the older 2.0.3. The validator being called twice is still expected and may be fixed in Pydantic 2.3 with pydantic/pydantic-core#867 |
You are right. For 2.0.3 and earlier, there is a crash
For 2.1.0 and newer, this piece of code works (exit without error), but in both cases, the wrong validator is selected for serialization:
Since input is validated as a PrefixedUUID class, I would expect that the logic defined for this class will be called during serialization. However, serialization is called for Special class, which has nothing to do with the validated model. I suppose that the serialization is done the same way as validation: serializers are called in the order of classes in the Union. When the model is successfully serialized using another union member's serializer, this value is returned. I don't think it makes sense :/ |
Right, you are correct, I see that now, that's definitely another bug. 👍 |
I've been wrestling with this issue and just found this bug report. I think I have two more examples for you to test against. Like @kubasaw, my main use case is using Pydantic with "third party" classes. While debugging my code, I ended up with some simple variations on the code from Types that exhibit the same behavior. Annotated ExampleThis example simple uses from typing import Annotated, Any, Callable, Literal, Optional, TypeAlias
from pydantic import BaseModel, ConfigDict, BeforeValidator, PlainSerializer, WithJsonSchema
def print_type(t):
def validate(x):
assert t == type(x)
print(f"{t} == {type(x)}")
return x
return validate
IntAlias = Annotated[
int,
BeforeValidator(print_type(int)),
PlainSerializer(lambda x: f"{type(x)} {x * 10}"),
WithJsonSchema({'type': 'integer'})
]
FloatAlias = Annotated[
float,
BeforeValidator(print_type(float)),
PlainSerializer(lambda x: f"{type(x)} {x * 100.0}"),
WithJsonSchema({'type': 'number'})
]
class Model(BaseModel):
model_config = ConfigDict(
strict=True,
)
# This will validate properly, but pick the first element in the Union as the serializer
value: IntAlias | FloatAlias
m_int = Model(value = 1) # <class 'int'> == <class 'int'>
m_float = Model(value = 2.0) # <class 'float'> == <class 'float'>
print(m_int.model_dump()) # {'value': "<class 'int'> 10"}
print(m_float.model_dump()) # {'value': "<class 'float'> 20.0"} <-- this should be 200.0 The comments on the last few lines show the output. Pydantic seems to get the correct types but wrong serializer. Playing around, I found that the serializer always comes from the first type in the list. ThirdPartyType ExampleI had hoped that the bug was limited to This example is based on the from typing import Any
from pydantic_core import core_schema
from typing_extensions import Annotated
from pydantic import (
BaseModel,
ConfigDict,
GetCoreSchemaHandler,
GetJsonSchemaHandler,
ValidationError,
)
from pydantic.json_schema import JsonSchemaValue
# ------------------------------------------------------------
# Basic Type
# ------------------------------------------------------------
class ThirdPartyType:
x: int
def __init__(self):
self.x = 0
class _ThirdPartyTypePydanticAnnotation:
@classmethod
def __get_pydantic_core_schema__(
cls,
_source_type: Any,
_handler: GetCoreSchemaHandler,
) -> core_schema.CoreSchema:
def validate_from_dict(value: dict) -> ThirdPartyType:
assert "x" in value, f'x not in {value.keys()}'
result = ThirdPartyType()
result.x = value["x"]
return result
from_dict_schema = core_schema.chain_schema(
[
core_schema.dict_schema(),
core_schema.no_info_plain_validator_function(validate_from_dict),
]
)
return core_schema.json_or_python_schema(
json_schema=from_dict_schema,
python_schema=core_schema.union_schema(
[
core_schema.is_instance_schema(ThirdPartyType),
from_dict_schema,
]
),
serialization=core_schema.plain_serializer_function_ser_schema(
lambda instance: {"x": instance.x}
),
)
@classmethod
def __get_pydantic_json_schema__(
cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
) -> JsonSchemaValue:
return handler(core_schema.dict_schema())
PydanticThirdPartyType = Annotated[
ThirdPartyType, _ThirdPartyTypePydanticAnnotation
]
# ------------------------------------------------------------
# Int Type
# ------------------------------------------------------------
class ThirdPartyIntType:
x: int
def __init__(self):
self.x = 0
class _ThirdPartyIntTypePydanticAnnotation:
@classmethod
def __get_pydantic_core_schema__(
cls,
_source_type: Any,
_handler: GetCoreSchemaHandler,
) -> core_schema.CoreSchema:
def validate_from_int(value: int) -> ThirdPartyIntType:
result = ThirdPartyIntType()
result.x = value
return result
from_int_schema = core_schema.chain_schema(
[
core_schema.int_schema(),
core_schema.no_info_plain_validator_function(validate_from_int),
]
)
return core_schema.json_or_python_schema(
json_schema=from_int_schema,
python_schema=core_schema.union_schema(
[
core_schema.is_instance_schema(ThirdPartyIntType),
from_int_schema,
]
),
serialization=core_schema.plain_serializer_function_ser_schema(
lambda instance: instance.x * 10
),
)
@classmethod
def __get_pydantic_json_schema__(
cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
) -> JsonSchemaValue:
return handler(core_schema.int_schema())
PydanticThirdPartyIntType = Annotated[
ThirdPartyIntType, _ThirdPartyIntTypePydanticAnnotation
]
# ------------------------------------------------------------
# Float Type
# ------------------------------------------------------------
class ThirdPartyFloatType:
x: float
def __init__(self):
self.x = 0
class _ThirdPartyFloatTypePydanticAnnotation:
@classmethod
def __get_pydantic_core_schema__(
cls,
_source_type: Any,
_handler: GetCoreSchemaHandler,
) -> core_schema.CoreSchema:
def validate_from_float(value: float) -> ThirdPartyFloatType:
result = ThirdPartyFloatType()
result.x = value
return result
from_float_schema = core_schema.chain_schema(
[
core_schema.float_schema(),
core_schema.no_info_plain_validator_function(validate_from_float),
]
)
return core_schema.json_or_python_schema(
json_schema=from_float_schema,
python_schema=core_schema.union_schema(
[
core_schema.is_instance_schema(ThirdPartyFloatType),
from_float_schema,
]
),
serialization=core_schema.plain_serializer_function_ser_schema(
lambda instance: instance.x * 100
),
)
@classmethod
def __get_pydantic_json_schema__(
cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
) -> JsonSchemaValue:
return handler(core_schema.float_schema())
PydanticThirdPartyFloatType = Annotated[
ThirdPartyFloatType, _ThirdPartyFloatTypePydanticAnnotation
]
class Model(BaseModel):
model_config = ConfigDict(
strict=True,
)
third_party_type: PydanticThirdPartyFloatType | PydanticThirdPartyIntType | PydanticThirdPartyType
m_float = Model(third_party_type=2.0)
assert isinstance(m_float.third_party_type, ThirdPartyFloatType)
assert m_float.third_party_type.x == 2.0
print("1)", m_float.model_dump()) # 1) {'third_party_type': 200.0} <-- should be 20.0
# assert m_float.model_dump() == {'third_party_type': 2.0 * 10}
m_int = Model(third_party_type=1)
assert isinstance(m_int.third_party_type, ThirdPartyIntType)
assert m_int.third_party_type.x == 1
print("2)", m_int.model_dump())
assert m_int.model_dump() == {'third_party_type': 1 * 100}
m_dict = Model(third_party_type = {"x": 3})
assert isinstance(m_dict.third_party_type, ThirdPartyType)
assert m_dict.third_party_type.x == 3
print("3)", m_dict.model_dump()) # 3) {'third_party_type': 300} <-- should be {'x': 3}
# assert m_dict.model_dump() == {'x': 3} The commented Note that the order in the Union is:
The serialized values all match the Workaround?As a workaround, I implemented a general purpose serializer that checks types and serializes accordingly. Using the simple def external_serializer(x):
if type(x) is int:
return f"{type(x)} {x * 10}"
elif type(x) is float:
return f"{type(x)} {x * 100.0}"
else:
assert False, f"Unknown type: {type(x)}"
IntAlias2 = Annotated[
int,
BeforeValidator(print_type(int)),
PlainSerializer(external_serializer),
WithJsonSchema({'type': 'integer'})
]
FloatAlias2 = Annotated[
float,
BeforeValidator(print_type(float)),
PlainSerializer(external_serializer),
WithJsonSchema({'type': 'number'})
]
class Model2(BaseModel):
model_config = ConfigDict(
strict=True,
)
# This will validate properly, but pick the first element in the Union as the serializer
value: IntAlias2 | FloatAlias2
m_int = Model2(value = 1) # <class 'int'> == <class 'int'>
m_float = Model2(value = 2.0) # <class 'float'> == <class 'float'>
print(m_int.model_dump()) # {'value': "<class 'int'> 10"}
print(m_float.model_dump()) # {'value': "<class 'float'> 200.0" Does anyone see any issues with this approach? In my actual application, I'm also working with a list of types that can include
Are there any special considerations I should take into account to properly validate Hopefully the above examples help in resolving this issue. |
There is a strange case, if the first type of the union is a python type, eg. class Language(BaseModel):
model_config = ConfigDict(validate_assignment=True)
language_code: str
@model_serializer(when_used='json')
def to_dict(self) -> str:
return self.language_code+'-serialized'
class Response(BaseModel):
# languages: List[Language|str]
language: str|Language ###### notice the order
class Config:
validate_assignment = True
@field_validator("language", mode='before')
@classmethod
def validate_source_language(cls, value: str): # value = "en"
if value == "auto":
return value
# If not 'auto', attempt to validate and return a Language instance
elif isinstance(value, str):
return Language(language_code=value)
raise ValueError('source_language must be either "auto" or a Language instance')
new_data = {'language': 'auto'}
new_response = Response(**new_data)
print(new_response.model_dump_json())
"""
'{"language":"auto"}'
"""
new_response.language = "en"
print(new_response.model_dump_json())
"""
'{"language":"en-serialized"}'
""" but from typing import Any, List, Union
from pydantic import (
BaseModel,
ConfigDict,
Field,
PlainValidator,
ValidationInfo,
ValidatorFunctionWrapHandler,
field_validator,
model_serializer,
model_validator,
root_validator,
validator,
)
from pydantic.functional_serializers import PlainSerializer
from pydantic.functional_validators import (
AfterValidator,
BeforeValidator,
WrapValidator,
)
from typing_extensions import Annotated
class Language(BaseModel):
model_config = ConfigDict(validate_assignment=True)
language_code: str
@model_serializer(when_used='json')
def to_dict(self) -> str:
return self.language_code+'-serialized'
class Response(BaseModel):
# languages: List[Language|str]
language: Language|str ########### inverted order
class Config:
validate_assignment = True
@field_validator("language", mode='before')
@classmethod
def validate_source_language(cls, value: str): # value = "en"
if value == "auto":
return value
# If not 'auto', attempt to validate and return a Language instance
elif isinstance(value, str):
return Language(language_code=value)
raise ValueError('source_language must be either "auto" or a Language instance')
new_data = {'language': 'auto'}
new_response = Response(**new_data)
print(new_response.model_dump_json())
"""
PydanticSerializationError: Error serializing to JSON: PydanticSerializationError: Error calling function `to_dict`: AttributeError: 'str' object has no attribute 'language_code'
"""
new_response.language = "en"
new_response.model_dump_json()
"""
'{"language":"en-serialized"}'
""" |
Another related issue, not sure if to open a new issue for this one: When using @model_serializer with inheritance, the child class's custom serializer method does not seem to be respected during serialization. Instead, the parent class's serializer method is called. This behavior is contrary to expectations where the child's serializer method should override the parent's. from pydantic import BaseModel, ConfigDict, model_serializer
class LanguageBase(BaseModel):
model_config = ConfigDict(validate_assignment=True)
language_code: str
@model_serializer(when_used='json')
def to_dict(self) -> str:
return self.language_code + '-serialized'
class Language(LanguageBase):
@model_serializer(when_used='json')
def to_dict_child(self) -> str:
return self.language_code + '-serialized-child'
class Response(BaseModel):
language: str | LanguageBase
class Config:
validate_assignment = True
# Sample data for creating a Response instance
new_data = {'language': Language(language_code='en')}
new_response = Response(**new_data)
# Attempt to serialize
print(new_response.json()) Expected Behavior The output of new_response.json() should be '{"language":"en-serialized-child"}', indicating that the child class's (Language) serializer method to_dict_child is used for serialization. The actual output is '{"language":"en-serialized"}'. This suggests that the LanguageBase class's serializer method to_dict is called instead of the Language class's to_dict_child. This issue is significant as it affects the polymorphic behavior of models in applications leveraging Pydantic for complex data handling and serialization. Correctly respecting method overrides in class hierarchies is fundamental for object-oriented design, enabling more flexible and clear implementations. It offers a "quack like a duck" alternative to generics in some scenarios, simplifying numerous use cases where specialized behavior is needed for derived model instances without compromising the simplicity and readability of the code. Though, if my usage of Pydantic is not optimal and there's a recommended approach for these kinds of transformations, I'd greatly appreciate any guidance or examples of how to properly implement this functionality. system info:
|
Initial Checks
Description
I noticed some strange behavior of models in my application while migrating to pydantic@2.
In the case where a Union is included in the model, despite the correct validation of the input data, the wrong Union member is selected for serialization.
For attached code I got following output:
Minor question/issue:
PrefixedUUID
model_validator
is called twice, but when I changetest_data
to JSON string and callContainer.model_validate_json
there is only single call to validator.Example Code
Python, Pydantic & OS Version
Selected Assignee: @davidhewitt
The text was updated successfully, but these errors were encountered: