Schema generation improvements #7054

adriangb · 2023-08-09T12:59:52Z

Tracking issue for #6951 and related work.

Pydantic v2 embraced the use of Annotated which has proved to be a really good decision. However in the process of getting this to work, we've learned a lot and made a lot of mistakes. One of those mistakes is the sheer complexity of GenerateSchema.

In particular since we put constraints (like min_length) directly on the type's schema the type needs to know about it's constraints. In other words, if you have Annotated[str, <some stuff>, Field(min_length=1)] then str needs to know about Field when it generates it's schema. That means you can't just iterate over the annotations left to right or something like that, hence we created __prepare_pydantic_annotations__. All of this has some pretty unfortunate consequences:

GenerateSchema is complex and difficult to refactor
Startup time is slow because of how much code has to run to generate the schemas (Add benchmark representing FastAPI startup time #7030, Very slow (FastAPI) application startup after using V2 due to _core_utils.py:walk #6768)
Caching is difficult since you can't just cache str - you have to cache the type and any annotations that may get applied to it. So we have no caching currently.
If you don't use __prepare_pydantic_annotations__ to do fancy things with constraints that get applied to a type they get run in Python, so they're slower. In other words, Annotated[str, AfterValidator(lambda x :x), Field(min_length=3)] is going to be considerably slower than Annotated[str, Field(min_length=3)] and not just because of the lambda function.
Constraints can't really be ordered, e.g. strip_whitespace after coercion, but before constraint checks on custom data types #6531 Apply to_upper, etc. in StringConstraints before checking for pattern (similar to strip_whitespace)? #6624

Ideally, we want to:

Simplify GenerateSchema and refactor it into an easily understandable class that we can then make public so that users can override it.
Improve startup time by simplifying GenerateSchema and introducing some caching.
Get rid of __prepare_pydantic_annotations__
Allow constraints to be re-ordered

To get there I think we'll want to:

Merge Handle constraints being applied to schemas that don't accept it #6951 which essentially re-implements all constraints in Python in a way in which they can be re-ordered
Port that to pydantic-core, maybe doing the return-enum thing proposed in make an enum-as-output for validators pydantic-core#833 to keep performance good in the simple cases and then cleaning up the constraints off of the type schemas
Replace the validators in Handle constraints being applied to schemas that don't accept it #6951 with the pydantic-core versions to make that path fast
Refactor GenerateSchema to get rid of __prepare_pydantic_annotations__ and instead rely on the mechanism introduced in Handle constraints being applied to schemas that don't accept it #6951
Multiple PRs to refactor and clean up GenerateSchema, making bits and pieces public as we feel that they are ready
Introduce caching on schema generation, some sort of lru_cache with weak keys where the types are the keys and the core schemas are the values should work

Other related issues:

Add a new "base64url" option for ser_json_bytes #7000

Selected Assignee: @samuelcolvin

The text was updated successfully, but these errors were encountered:

adriangb · 2023-08-09T13:00:01Z

cc @samuelcolvin @dmontagu

adriangb · 2023-09-22T13:37:18Z

I still think we need to do some cleanup of GenerateSchema beyond what we've already done (see merged PRs linked to from the fist post in this issue), but that may need larger refactors in pydantic-core that might require waiting for v3.

For now we've improved performance vastly with some more minimal refactoring (#7565, #7536, #7535, #7529, #7528, #7527, #7524, #7523 and #7522) so I'm going to close this issue for now.

pydantic-hooky bot assigned samuelcolvin Aug 9, 2023

pydantic-hooky bot added the unconfirmed Bug not yet confirmed as valid/applicable label Aug 9, 2023

samuelcolvin removed the unconfirmed Bug not yet confirmed as valid/applicable label Aug 22, 2023

adriangb closed this as completed Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema generation improvements #7054

Schema generation improvements #7054

adriangb commented Aug 9, 2023 •

edited

adriangb commented Aug 9, 2023

adriangb commented Sep 22, 2023

Schema generation improvements #7054

Schema generation improvements #7054

Comments

adriangb commented Aug 9, 2023 • edited

adriangb commented Aug 9, 2023

adriangb commented Sep 22, 2023

adriangb commented Aug 9, 2023 •

edited