You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pydantic v2 embraced the use of Annotated which has proved to be a really good decision. However in the process of getting this to work, we've learned a lot and made a lot of mistakes. One of those mistakes is the sheer complexity of GenerateSchema.
In particular since we put constraints (like min_length) directly on the type's schema the type needs to know about it's constraints. In other words, if you have Annotated[str, <some stuff>, Field(min_length=1)] then str needs to know about Field when it generates it's schema. That means you can't just iterate over the annotations left to right or something like that, hence we created __prepare_pydantic_annotations__. All of this has some pretty unfortunate consequences:
GenerateSchema is complex and difficult to refactor
Caching is difficult since you can't just cache str - you have to cache the type and any annotations that may get applied to it. So we have no caching currently.
If you don't use __prepare_pydantic_annotations__ to do fancy things with constraints that get applied to a type they get run in Python, so they're slower. In other words, Annotated[str, AfterValidator(lambda x :x), Field(min_length=3)] is going to be considerably slower than Annotated[str, Field(min_length=3)] and not just because of the lambda function.
Port that to pydantic-core, maybe doing the return-enum thing proposed in make an enum-as-output for validators pydantic-core#833 to keep performance good in the simple cases and then cleaning up the constraints off of the type schemas
Multiple PRs to refactor and clean up GenerateSchema, making bits and pieces public as we feel that they are ready
Introduce caching on schema generation, some sort of lru_cache with weak keys where the types are the keys and the core schemas are the values should work
I still think we need to do some cleanup of GenerateSchema beyond what we've already done (see merged PRs linked to from the fist post in this issue), but that may need larger refactors in pydantic-core that might require waiting for v3.
For now we've improved performance vastly with some more minimal refactoring (#7565, #7536, #7535, #7529, #7528, #7527, #7524, #7523 and #7522) so I'm going to close this issue for now.
Tracking issue for #6951 and related work.
Pydantic v2 embraced the use of
Annotated
which has proved to be a really good decision. However in the process of getting this to work, we've learned a lot and made a lot of mistakes. One of those mistakes is the sheer complexity ofGenerateSchema
.In particular since we put constraints (like
min_length
) directly on the type's schema the type needs to know about it's constraints. In other words, if you haveAnnotated[str, <some stuff>, Field(min_length=1)]
thenstr
needs to know aboutField
when it generates it's schema. That means you can't just iterate over the annotations left to right or something like that, hence we created__prepare_pydantic_annotations__
. All of this has some pretty unfortunate consequences:GenerateSchema
is complex and difficult to refactor_core_utils.py:walk
#6768)str
- you have to cache the type and any annotations that may get applied to it. So we have no caching currently.__prepare_pydantic_annotations__
to do fancy things with constraints that get applied to a type they get run in Python, so they're slower. In other words,Annotated[str, AfterValidator(lambda x :x), Field(min_length=3)]
is going to be considerably slower thanAnnotated[str, Field(min_length=3)]
and not just because of the lambda function.strip_whitespace
after coercion, but before constraint checks on custom data types #6531 Applyto_upper
, etc. inStringConstraints
before checking forpattern
(similar tostrip_whitespace
)? #6624Ideally, we want to:
GenerateSchema
and introducing some caching.__prepare_pydantic_annotations__
To get there I think we'll want to:
__prepare_pydantic_annotations__
and instead rely on the mechanism introduced in Handle constraints being applied to schemas that don't accept it #6951Other related issues:
ser_json_bytes
#7000Selected Assignee: @samuelcolvin
The text was updated successfully, but these errors were encountered: