Very slow (FastAPI) application startup after using V2 due to `_core_utils.py:walk` #6768

MarkusSintonen · 2023-07-20T08:46:27Z

Initial Checks

I confirm that I'm using Pydantic V2

Description

Our FastAPI application startup performance degraded from about 5s to over 20s after using V2. This makes eg our test runs very slow (haven't tried how long it takes in prod to get servers up and running). ~~Issue seems to be related to walking of core schemas as below cProfile results show (from our app init).~~

Issue seems to be coming from poor performance of initializing TypeAdapters in FastAPI. The issue might be also related on how FastAPI uses new TypeAdapter interface. ~~But there might be some issue also in walking of core schemas, what do you think?~~
~~Full cProfile print output: cprofile.txt~~

See latest profiles here: #6768 (comment)

Below are outdated profiles, see link above.

(SLOW) Pydantic v2.0.3 & FastAPI v0.100.0
ncalls            tottime  percall cumtime   percall filename:lineno(function)
3854141/395376    3.575    0.000   18.577    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:193(_walk)
3850208/395376    3.395    0.000   20.191    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:190(walk)
3003272           2.298    0.000    2.644    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:106(get_ref)
9812              1.773    0.000   31.855    0.003   xxx/python3.11/site-packages/pydantic/type_adapter.py:142(__init__)
1352590/179311    1.712    0.000    6.842    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:434(flatten_refs)
309655/256580     1.587    0.000   12.204    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:309(handle_model_fields_schema)
1145758/38773     1.536    0.000    6.745    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:411(collect_refs)
5766050           1.451    0.000    1.451    0.000   {method 'copy' of 'dict' objects}
3229655/682916    1.382    0.000   16.172    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:200(_handle_other_schemas)
9718277           1.131    0.000    1.131    0.000   {method 'get' of 'dict' objects}
83003             0.831    0.000    3.959    0.000   xxx/python3.11/site-packages/pydantic/_internal/_generate_schema.py:1288(_get_prepare_pydantic_annotations_for_known_type)


(FAST) Pydantic v1.10.10 & FastAPI v0.100.0
ncalls            tottime  percall  cumtime  percall filename:lineno(function)
871294/78525      2.790    0.000    3.005    0.000   {built-in method _abc._abc_subclasscheck}
545241/8813       0.775    0.000    1.942    0.000   xxx/python3.11/copy.py:128(deepcopy)
10618             0.490    0.000    1.593    0.000   xxx/python3.11/site-packages/fastapi/utils.py:63(create_response_field)
19486/917         0.377    0.000    1.906    0.002   xxx/python3.11/copy.py:227(_deepcopy_dict)
3202/2127         0.321    0.000    3.182    0.001   xxx/python3.11/site-packages/pydantic/generics.py:75(__class_getitem__)
40290/28472       0.313    0.000    1.657    0.000   xxx/python3.11/inspect.py:2428(_signature_from_callable)
28472             0.285    0.000    0.979    0.000   xxx/python3.11/inspect.py:2333(_signature_from_function)
871294/78525      0.240    0.000    3.033    0.000   <frozen abc>:121(__subclasscheck__)
1422963/1412127   0.231    0.000    0.244    0.000   {built-in method builtins.isinstance}
52781             0.213    0.000    0.265    0.000   xxx/python3.11/inspect.py:2972(__init__)
14998             0.205    0.000    0.206    0.000   {method '__reduce_ex__' of 'object' objects}


(FAST) Pydantic v1.10.10 & FastAPI v0.99.1
ncalls            tottime  percall  cumtime  percall filename:lineno(function)
858955/76183      1.997    0.000    2.190    0.000   {built-in method _abc._abc_subclasscheck}
538585/2157       0.825    0.000    1.788    0.001   xxx/python3.11/copy.py:128(deepcopy)
23230             0.509    0.000    0.521    0.000   xxx/python3.11/site-packages/fastapi/dependencies/models.py:16(__init__)
52781             0.425    0.000    0.476    0.000   xxx/python3.11/inspect.py:2972(__init__)
10618             0.419    0.000    1.480    0.000   xxx/python3.11/site-packages/fastapi/utils.py:75(create_response_field)
19486/917         0.367    0.000    1.776    0.002   xxx/python3.11/copy.py:227(_deepcopy_dict)
3202/2127         0.312    0.000    3.025    0.001   xxx/python3.11/site-packages/pydantic/generics.py:75(__class_getitem__)
40290/28472       0.279    0.000    1.778    0.000   xxx/python3.11/inspect.py:2428(_signature_from_callable)
28472             0.265    0.000    1.148    0.000   xxx/python3.11/inspect.py:2333(_signature_from_function)
858955/76183      0.219    0.000    2.216    0.000   <frozen abc>:121(__subclasscheck__)
1360643/1349807   0.212    0.000    0.224    0.000   {built-in method builtins.isinstance}

Example Code

# This TypeAdapter usage is very slow with large amount of API models (eg API models are generic over inner data models)
# https://github.com/tiangolo/fastapi/blob/f7e3559bd5997f831fb9b02bef9c767a50facbc3/fastapi/_compat.py#L101
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
    Annotated[self.field_info.annotation, self.field_info]
)

Python, Pydantic & OS Version

pydantic version: 2.0.3
        pydantic-core version: 2.3.0 release build profile
                 install path: /Users/markussintonen/Library/Caches/pypoetry/virtualenvs/userdata-IiJ3qKwC-py3.11/lib/python3.11/site-packages/pydantic
               python version: 3.11.4 (main, Jul 17 2023, 14:44:40) [Clang 14.0.3 (clang-1403.0.22.14.1)]
                     platform: macOS-13.4.1-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

Selected Assignee: @samuelcolvin

The text was updated successfully, but these errors were encountered:

MarkusSintonen · 2023-07-20T10:18:57Z

When I try to patch FastAPI so it memoizes the FieldInfo->TypeAdapter mapping, it reduces the initialization time from ~20s to ~10s. So it seems its doing some unnecessary work there. But its still spending lot of the time in walk.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1682626/144308    1.895    0.000    8.208    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:193(_walk)
1678693/144308    1.617    0.000    8.903    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:190(walk)
139877/116568    0.831    0.000    5.860    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:309(handle_model_fields_schema)
566346/69584    0.695    0.000    3.330    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:434(flatten_refs)
    71062    0.582    0.000    0.606    0.000 xxx/python3.11/enum.py:685(__call__)
  2458408    0.563    0.000    0.563    0.000 {method 'copy' of 'dict' objects}
482385/10384    0.542    0.000    2.407    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:411(collect_refs)
1425888/284906    0.523    0.000    7.021    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:200(_handle_other_schemas)
  4186137    0.476    0.000    0.476    0.000 {method 'get' of 'dict' objects}
  1261789    0.439    0.000    0.581    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:106(get_ref)
     1013    0.407    0.000    8.590    0.008 xxx/python3.11/site-packages/pydantic/_internal/_model_construction.py:393(complete_model_class)

Here is the patch I did for memoization:

# Ugly hacks follow to make FieldInfo hashable and monkey patching FastAPI
def hashable(obj):
    if isinstance(obj, typing.Hashable):
        return obj
    elif isinstance(obj, dict):
        return tuple(sorted(((k, hashable(v)) for k, v in obj.items()), key=lambda t: t[0]))
    elif isinstance(obj, typing.Iterable):
        return tuple(hashable(item) for item in obj)
    else:
        raise TypeError(type(obj))


class FieldInfoKey:
    def __init__(self, field_info: FieldInfo):
        self.field_info = field_info
        self.field_info_hashable = hashable(field_info._attributes_set)

    def __hash__(self):
        return hash(self.field_info_hashable)

    def __eq__(self, other):
        if not isinstance(other, FieldInfoKey):
            return False
        return self.field_info._attributes_set == other.field_info._attributes_set


@weak_cache
def type_adapter_cache(f: FieldInfoKey) -> TypeAdapter:
    return TypeAdapter(Annotated[f.field_info.annotation, f.field_info])


def ugly_init_patch(self):
    self._field_info_key = FieldInfoKey(self.field_info)
    self._type_adapter = type_adapter_cache(self._field_info_key)

from fastapi._compat import ModelField
ModelField.__post_init__ = ugly_init_patch

Maybe FastAPIs _compat.ModelField could also create TypeAdapters lazily, but it could instead just hide issues in walk...? (Lazy initialization would also cause issues in FastAPI route definitions validation. So badly defined route would only break on usage)

samuelcolvin · 2023-07-20T13:12:38Z

could be related to #6748

samuelcolvin · 2023-07-20T13:17:31Z

Thanks so much for reporting, we're looking into this.

adriangb · 2023-07-20T13:35:19Z

Can you give us a sample of your models? For example, the model from your slowest endpoint with the field names and model names mangled or something like that.

adriangb · 2023-07-20T13:44:03Z

I do think FastAPI or Pydantic should cache TypeAdapters to save not only this walking but also creating the SchemaValidator and such. There's memory benefits to that as well.

MarkusSintonen · 2023-07-20T14:56:37Z

Can you give us a sample of your models? For example, the model from your slowest endpoint with the field names and model names mangled or something like that.

It seems there isnt any specific model that would be slow. There are just so many of unique request/response models from generic parametrization over different data models we have. And each model takes a bunch of time to walk. I took a sample of where walk actually spends time like below. (Note that this patches FastAPI so it caches TypeAdapters. So we only calculate work for unique model combos created by FastAPI routes.)

def patch_walk():
    orig = _WalkCoreSchema.walk
    durations = {}

    def patch(self, schema, *args, **kwargs):
        key = schema["type"] + "_" + str(schema.get("cls") or schema.get("model_name") or "<unknown>")
        start = time()
        res = orig(self, schema, *args, **kwargs)
        durations[key] = durations.get(key, 0) + (time() - start)

        return res

    _WalkCoreSchema.walk = patch
    return durations


def init_app_stuff():
    ModelField.__post_init__ = ugly_caching_init_patch  # Cache slow TypeAdapters in FastAPI
    durations = patch_walk()

    # do FastAPI app init...

    print(f"ENTRIES COUNT: {len(durations)}")
    print(f"TOTAL DURATION: {sum(durations.values())}")
    top_kvs = [*sorted(((k, v) for k, v in durations.items()), key=lambda t: t[1], reverse=True)][0:100]
    print("TOP:")
    print(json.dumps(dict(top_kvs), indent=2))

Yielding:

ENTRIES COUNT: 2639
TOTAL DURATION: 15.628021955490112
TOP:
{
  "definition-ref:<unknown>": 2.6756765842437744,
  "default:<unknown>": 1.8995928764343262,
  "nullable:<unknown>": 1.1915950775146484,
  "function-before:<unknown>": 1.0592408180236816,
  "list:<unknown>": 0.6754055023193359,
  "function-after:<unknown>": 0.41553235054016113,
  "str:<unknown>": 0.34151172637939453,
  "model-fields:DataModelHavingNestedDictFields": 0.3339402675628662,
  "int:<unknown>": 0.304196834564209,
  "model-fields:VerySmallDataModel": 0.28247642517089844,
  "model-fields:DataModelHavingStringAndEnumFields": 0.26870179176330566,
  "model:<class 'DataModelHavingStringAndEnumFields'>": 0.24478888511657715,
  "model-fields:ResponseModel[DataModelHavingNumericFieldsAndCoupleNestedModels]": 0.22575044631958008,
  "model:<class 'VerySmallDataModel'>": 0.19563722610473633,
  "json-or-python:<unknown>": 0.1840825080871582,
  "chain:<unknown>": 0.14197635650634766,
  "lax-or-strict:<unknown>": 0.11188220977783203,
  "model-fields:AnotherVerySmallDataModel": 0.10825538635253906,
  "model-fields:ModelHavingLotOfDifferentFields": 0.0886530876159668,
  "model-fields:Model2HavingLotOfDifferentFields": 0.08829522132873535,
  "model-fields:SimpleResponseModel": 0.07635879516601562,
  "dict:<unknown>": 0.07629108428955078,
  "model-fields:ModelHavingVaryingFields": 0.07033467292785645,
  "model-fields:SmallResponseModel": 0.0696713924407959,
  "literal:<unknown>": 0.0684511661529541,
  "model-fields:SmallErrorResponseModel": 0.0679781436920166,
  "model-fields:AnotherSmallErrorResponseModel": 0.05967593193054199,
  "definitions:<unknown>": 0.059203147888183594,

  // Lots of stuff but nothing really interesting...

  "model-fields:ModelHavingNestedModelsWithNestedModels": 0.013972759246826172,
  "model-fields:AnotherModelHavingNestedModelsWithNestedModels": 0.013957023620605469,
  "model:<class 'SimpleModelWithIntFields'>": 0.01356363296508789,
  "model-fields:SimpleResponseModel[DataModelWithLotOfFields]": 0.013484001159667969,
  "model-fields:SomeModelHavingNestedModelsWithNestedModel": 0.013458251953125,
  "any:<unknown>": 0.013149023056030273,
  "model-fields:AnotherSimpleResponseModel[ModelWIthLotOfFields]": 0.012750387191772461,
  "model-fields:SimpleModel": 0.012310504913330078,
  "dataclass:<class 'ErrorResponseModel'>": 0.011614799499511719,
  "model:<class 'AnotherErrorResponseModel'>": 0.011425018310546875,
  "model-fields:OneResponseModel": 0.011368989944458008,
  "model-fields:SimpleModel": 0.011211872100830078,
  "model-fields:AnotherSimpleModel": 0.011202573776245117,
}

dmontagu · 2023-07-20T15:29:32Z

While I doubt it will be "easy" to improve this, I do think there's likely to be a lot of potential for improvement because performance of model/TypeAdapter instantiation hasn't really been a concern up to this point (though clearly we want to improve your situation).

@MarkusSintonen I've pushed a branch to pydantic called lazy-type-adapter which makes type adapters instantiate the core schemas and validators/serializers lazily. Is there any chance you could try installing pydantic from that branch and see if it improves your app startup time? I don't think that branch is currently viable for merging since it won't (yet) trigger build-time errors in all cases where it should, BUT, I think the extent to which that branch impacts startup performance will significantly help me understand where there is potential for improvements.

Note also that that branch may not actually improve total startup time for accessing all endpoints, I just think it may delay per-model/endpoint initialization until the first request that hits that endpoint, which I'm thinking might result in a significant improvement to startup time (i.e., time it takes to get a single response from an arbitrary endpoint). I'd be interested to understand if that is not a sufficient improvement for you (i.e., you need to keep total initialization time low even if you can get a response from some endpoint very fast), but either way it will be useful to understand the impact.

I think Adrian and I also have some ideas about how we can do better caching in the schema generation process that should reduce the amount of work being done by the TypeAdapter; that's not quite ready, but improving this is a priority for us.

MarkusSintonen · 2023-07-20T17:36:13Z

I've pushed a branch to pydantic called lazy-type-adapter which makes type adapters instantiate the core schemas and validators/serializers lazily. Is there any chance you could try installing pydantic from that branch and see if it improves your app startup time?

I tried that and it indeed helps a lot! Startup time dropped from ~20s to ~6s, very promising! (Didn't need to patch FastAPI anymore to cache TypeAdapters) Heres new cProfile: cprofile.txt

dmontagu · 2023-07-20T20:08:42Z

@MarkusSintonen okay great, good to know. Like I said, I don't quite think we can take that route, but good to hear it's a step in the right direction.

One thing @adriangb and I discovered — FastAPI is actually creating another type adapter each time you include a route in another route. As a result, there's a lot of potentially-unnecessary overhead if you are using nested API routers; even if you just add endpoints to routers and then add the routers to the app, that's probably a ~2x overhead that could be removed if we rework FastAPI to reuse the type adapter when possible. I'll look into this, but are you using much nesting of routers? I can see in your cProfile that you created ~1500 instances of APIRouter (which resulted in ~10k type adapters being created), but that seems to me like you probably don't actually have that many distinct endpoints, in which case there might be some low-hanging fruit there too.

MarkusSintonen · 2023-07-21T04:48:56Z

FastAPI is actually creating another type adapter each time you include a route in another route. As a result, there's a lot of potentially-unnecessary overhead

@dmontagu yes I also observed that as you can see in the above message. So when I make FastAPI to cache the FieldInfo->TypeAdapter mapping it reduces the startup time by about 50%. So it takes about 10s to startup (Still 2x compared to Pydantic V1)

MarkusSintonen · 2023-07-21T04:52:46Z

It would be easier to open an improvement PR to FastAPI to reuse TypeAdapters if Pydantic would make FieldInfo properly hashable/immutable. Then it would be possible to make a cache in FastAPI eg WeakKeyDictionary[FieldInfo, TypeAdapter]

adriangb · 2023-07-21T06:58:45Z

The issue with that is that FieldInfo can contain stuff like arbitrary JSON, which you demonstrate above can be made to be hashable but I'm not sure that sort of trick will extend to all of the data in FieldInfo.

adriangb · 2023-07-21T07:08:14Z

To summarize so far:

Things seem to be more expensive now than they were in v1. I wouldn't be surprised if that's just the reality of the matter but David did mention that some things in v1 were being done lazily, so maybe we just moved work from one place to another. I usually like things in my web apps to not be lazy because I'd rather have a 10s start than random 500ms latency for users.
FastAPI re-creates every route (and every TypeAdapter) when mounting. This seems like it should be an entirely avoidable problem with some refactoring in FastAPI. What I did in Xpresso was defer work like this until the first of (1) ASGI lifespan or (2) the first request. This meant that for most applications running Uvicorn in production work happened before the first request (so no random latency for users) but no work happened when importing and the lifespan isn't required (which makes tests and development not painful). A combination of this and not re-loading the world just for mounting ASGI apps would make a huge difference for FastAPI. @tiangolo and I have talked about using Starlette's Mount in the past, I believe the blocker for that was some breaking changes in behavior, but maybe now with Starlette 1.0 around the corner would be the time for FastAPI to bite the bullet on that sort of thing.
Reducing the number of TypeAdapters per route. The 1:1 mapping of TypeAdapter to endpoint is nice, but in theory they could all get squashed into a single type / TypeAdapter and processed at the same time. For various reasons this would be faster.
Improving the performance of the stuff Pydantic does internally. Basically improving the performance of creating a TypeAdapter. I think we'd mainly have to do this via caching. This is non-trivial because e.g. the CoreSchema is just a dict of dicts/lists, so if we store some cache metadata on the topmost level there's no way to do cache invalidation if something edits an internal spot.

MarkusSintonen · 2023-07-21T07:29:46Z

I usually like things in my web apps to not be lazy because I'd rather have a 10s start than random 500ms latency for users.

Yeah I mostly agree. Currently its just a large jump backwards when developing (large applications) with PydanticV2/FastAPI. As getting the test running takes (without any fixes/hacks) 5x longer. This is only locally so Im not sure at all how long it would take in production to startup after deploys. Could be now close to a ~minute. (Also unknown how much it would actually consumer memory there, havent gotten there yet).

FastAPI re-creates every route (and every TypeAdapter) when mounting

Its quite clear that FastAPI is also doing things very suboptimally now with TypeAdapters. As with the simple patch it can be reduced quite a lot (but its quite a bit slower). Not sure how feasible the weak-dict like patch is, as FieldInfos are not easily hashable (not sure either could something in FastAPI mutate FieldInfos on the fly...)

Improving the performance of the stuff Pydantic does internally. Basically improving the performance of creating a TypeAdapter. I think we'd mainly have to do this via caching.

Hopefully there would also fixes to logic how TypeAdapter creates the CoreSchemas. Maybe there is some unneeded repeated deep traversal of the dicts. Probably some of the work could be performed while constructing the CoreSchema instead of repeatedly retraversing the whole object tree. Or moving some traversal to Rust side :D

samuelcolvin · 2023-07-21T08:54:14Z

@MarkusSintonen any chance you'd be willing to jump on a call with some of the Pydantic team to tell us a bit about your application and how you're using Pydantic? If so, drop me an email to samuel@pydantic.dev.

adriangb · 2023-07-21T09:05:30Z

We've reached out to @tiangolo to see what we can do on the FastAPI side.

In the meantime I've been trying to optimize what we do on the Pydantic side. @MarkusSintonen could you try this branch and see if it makes any difference? I'm not sure that it will but there's a chance.

MarkusSintonen · 2023-07-21T09:22:53Z

could you try this branch and see if it makes any difference

Sure, I'll try later today! All in all, I think if we get FastAPI to do TypeAdapter reusing, its already a great improvement. 10sec startup in tests is livable. The overall improvement from Pydantic2 is so great! Thank you all!

chance you'd be willing to jump on a call with some of the Pydantic

Sure I'll drop you message later today

adriangb · 2023-07-21T09:24:09Z

10sec startup in tests is livable

Would 0s (give or take, if we did lazy startup) in tests and 10s in production be even better?

MarkusSintonen · 2023-07-21T09:26:03Z

10sec startup in tests is livable

Would 0s (give or take, if we did lazy startup) in tests and 10s in production be even better?

Even better :D (probably some level of breaking change for FastAPI)

adriangb · 2023-07-21T09:28:58Z

I don't think it needs to be!

MarkusSintonen · 2023-07-21T17:36:22Z

In the meantime I've been trying to optimize what we do on the Pydantic side. MarkusSintonen could you try this branch and see if it makes any difference? I'm not sure that it will but there's a chance.

@adriangb Hmm interestingly its slower. So Pydantic 2.0.3 takes ~20s and this branch takes ~30s (without FastAPI patches). Here is (partial) cProfile: cprofile.txt

MarkusSintonen · 2023-07-22T19:44:37Z

@adriangb I wrote a test for you that tries to emulate the performance issue:
test_perf.py.txt
Eg it gives similar cProfile results. You can try it out on investigations. Its kinda similar on how generics get used on our application with FastAPIs API models. (But all else is just emulated and not really anything thats actually used).

adriangb · 2023-07-22T22:24:16Z

Awesome that is super helpful!

MarkusSintonen · 2023-07-28T07:46:10Z

@samuelcolvin / @adriangb FYI we are testing PydanticV2 on our EKS cluster. It has a very noticeable impact on the overall memory usage. Memory usage increases by over 2x. I guess it might be related to the creation of core-schemas and FastAPI initializing them all at startup?

It doesnt seem like a memory leak as we are not seeing it creeping upwards. So probably caused by initialization of all the internal structures.

HenriBlacksmith · 2023-08-15T22:47:59Z

I did not dig into the profiling in my case, but just to confirm that @MarkusSintonen is not alone, I attempted a migration to Pydantic v2 on the application I am working on and tests durations exploded from 15 mins to 1h30.

The slowest tests cases moved from around 15s to 20s and the fastest tests were around 1ms and increased to 700ms.

I cannot be specific about the application but It is using FastAPI and Pydantic with many models including some generic ones (for Paginated API responses for instance).

For now this is a blocker to move to Pydantic v2.

Versions before (ok):

fastapi==0.101.1
pydantic[dotenv,email]==1.10.12

Versions after (nok):

fastapi==0.101.1
pydantic[email]==2.1.1
pydantic-core==2.4.0
pydantic-settings==2.0.2

EDIT

I found a way to circumvent the issue and accelerate my tests by removing the cold startup time of FastAPI by using a (almost) session scoped fixture for the FastAPI application.

The only tests which require to modify the application configuration (Feature flags for instance) will have to go through the startup though.

dmontagu · 2023-08-16T23:40:12Z

Yeah, @HenriBlacksmith I think the issue is that there is a lot more happening during app creation than used to, it makes sense to me that a using a scoped fixture for the fastapi app would make things MUCH faster for you.

That said, we are still trying to find ways to speed things up by reducing the amount of work being done eagerly to prepare core schemas etc.

adriangb · 2023-09-22T13:24:03Z

We've actually fully resolved this now. On the benchmark from this issue, startup time has gone from about 11s on my laptop down to 2.5s, so back to v1 levels of performance. There were no compromises / observable changes made (aside from possibly some changes to the generated CoreSchemas), just simplifying code and adding some caches.

It can be even faster if you set the PYDANTIC_SKIP_VALIDATING_CORE_SCHEMAS env var being introduced in #7565. Most schemas are valid and do not need validation, and if you are starting up your FastAPI app in tests then you're already validating it there, it doesn't need to be re-validated multiple times. So you could set that env var by default to 'true' (in conftest.py, in your deployment env vars) and then override it to 'false' in a single test responsible for validating your schemas. That way even your tests will run faster.

@MarkusSintonen I'd appreciate if you can test the next release (or main) and let us know if we can close this issue.

edgarhuichen · 2023-11-29T13:38:21Z

Well, I still experience problems with startup time in a container. It takes more than 5 seconds for the app to return a simple request. The app is deployed in the Azure Container Apps.

adriangb · 2023-11-29T13:42:44Z

Was it faster than 5s on v1? Do you have anything we can run to test this performance?

edgarhuichen · 2023-11-29T13:51:17Z

Was it faster than 5s on v1? Do you have anything we can run to test this performance?

I did not test performance on v1. A bit more context: the ACA is by default scaled to 0 instance. When I pin the probe endpoint, it sometimes takes very long for the app to respond. This is also causing scaling issues because of the use case I have. One thing worth mentioning: I have created a app user to run the container. Not sure if this will degrade the performance.

adriangb · 2023-11-29T13:52:57Z

I'm not sure about your deployment topology but as per the benchmarks above you'd have to have some very complex setup with hundreds of generic models or something like that for it to take 5s. Or do you have a simple example we can reproduce with?

edgarhuichen · 2023-11-29T14:01:18Z

I'm not sure about your deployment topology but as per the benchmarks above you'd have to have some very complex setup with hundreds of generic models or something like that for it to take 5s. Or do you have a simple example we can reproduce with?

Hi Adrian, just tested starting up the app locally and it takes approximately 3 seconds each time. So it's ok. I guess the issue is on Azure side. Thanks for your attention. I will contact Azure team for help.

michaelgmiller1 · 2024-02-14T19:43:05Z

@dmontagu @adriangb any ideas on if there are any big performance wins in store for V2? We are trying to upgrade from V1 and are seeing 3-4x slowdowns on app startup. We have lots of nested and recursive models, and even with 2.6.1, we are still seeing slowness.

MarkusSintonen · 2024-03-02T19:47:59Z

Hi @adriangb / @samuelcolvin!

Sorry I haven't replied sooner. I was on a parental leave 👶

I tested 2.6.3 version of Pydantic against our code base with PYDANTIC_SKIP_VALIDATING_CORE_SCHEMAS=true env and defer_build=True with all our main models. But it does not seem to make a difference. Maybe just a marginal difference of a few seconds. 😔 So without and with those params its ~36s vs ~34s.

Checked the profiler again and it still looks pretty much the same. Huge amount of time is spent in core schema building. Like 90% of the test startup time goes there. Maybe FastAPI doesn't work at all with defer_build=True? That should lazily build the core schema when validation is actually used? But maybe it doesnt work with TypeAdapter+Annotated types (like FastAPI uses).

Some more info about the setup:

We are using session fixture for our FastAPI instance so slowness is only at the test start up. After that tests run quickly but ofcourse its cumbersome now to write/use the API tests.
In our production the startup slowness with core schema initialization is also very visible. It takes 40-50s to get the service started. Therefore in our HPA Kubernetes setup the scale up is quite delayed and spiky load can causes issues.

samuelcolvin · 2024-03-03T15:18:34Z

Hi @MarkusSintonen, I'm sorry you're still having trouble.

I see a few ways forward (not necessarily mutually exclusive):

With the new pyo3 API which should improve the performance of accessing & creating python objects in rust a lot, we can maybe move some of the walk logic into Rust, there's already a pr for this
Rework walk and make it faster without changing the high-level logic
reimplement walk from scratch to not walk core schema so many times
cache core schemas, maybe even in a file
see if there are fixes possible in FastAPI to not create schemas so many times, I'm not optimistic this will get accepted

MarkusSintonen · 2024-03-03T19:45:02Z

@samuelcolvin

Some more info: I verified if the cProfile in the original PR description actually reflects the actual time spent on different parts of core schema generation. It seems cProfile in this case causes a huge measurement error due to how highly recursive the core schema "walking" is. I instead manually instrumented the GenerateSchema.generate_schema, GenerateSchema.clean_schema and _core_utils.walk_core_schema to see how much time is actually spent inside these.

It actually seems its the GenerateSchema.generate_schema that takes most of the time in our case. At initialization 74% of time is spent there! Only 12% of time is spent in _core_utils.walk_core_schema.

Total time to startup 36.80s

GenerateSchema.generate_schema 27.30s
_core_utils.walk_core_schema   4.39s
GenerateSchema.clean_schema    4.78s  (this calls _core_utils.walk_core_schema)

Then I cProfiled just the heaviest GenerateSchema.generate_schema which gave following results for functions internal and cumulative times:
cprofile_time.txt
cprofile_cumulative.txt

The profiler doesn't show any obvious spot taking most of the time. Its just doing so much overall so it starts to add up with lot of models.

I hope this helps!

(Above tests used defer_build and PYDANTIC_SKIP_VALIDATING_CORE_SCHEMAS. But they dont seem to bring much of a difference)

MarkusSintonen · 2024-03-03T20:00:20Z

Another idea: would it be somehow possible to "precompile" the core schemas? That would remove all the overhead of regenerating those every time on every startup as models don't really change that often (unless someone uses some kind of dynamic model, then precompiling wouldn't work). It would require exposing some kind of "exporting" of core schemas and then "importing" those into the Pydantic models. Then users could decide how exactly they get distributed and kept up-to-date. Maybe there could be some validation in "importing" to check the core schema matches the model its imported to (but again validating too much might make it slow again...). Looking at contents of core schemas there are the custom validator function callbacks and some other callbacks also there. Those would need to be handled at "import" time so the actual callbacks are added there to the core schema dicts, etc.

samuelcolvin · 2024-03-03T20:31:32Z

It actually seems its the GenerateSchema.generate_schema that takes most of the time in our case. At initialization 74% of time is spent there! Only 12% of time is spent in _core_utils.walk_core_schema.

That's really interesting, and saves me a lot of time.

With that we might be able to look into the generate_schema performance and see what we can do.

would it be somehow possible to "precompile" the core schemas?

Ye, we've thought about that - it's roughly what I meant (although I wasn't clear at all) by "cache core schemas, maybe even in a file".

I think this could work - the hard is bit might be working out which of the pre-built schemas TypAdapter or BaseModel should use - presumably we need them to automatically look up the stored schemas as use them so no changes to FastAPI required (we can talk to @tiangolo about supporting this in FastAPI, but it'll be quicker to implement if that's not required.

If the schemas are always pydantic models in the root of a module, I guess the key can be as simple as {module_name}:{model_name} - I guess we start by supporting the simple case and extend from there.

How are you schemas defined?

MarkusSintonen · 2024-03-04T18:43:49Z

If the schemas are always pydantic models in the root of a module, I guess the key can be as simple as {module_name}:{model_name} - I guess we start by supporting the simple case and extend from there.
How are you schemas defined?

Our Pydantic models are mainly defined in two places: Our internal Python package having "entity" data models (lots of these with varying complexity). Then we have our API models (FastAPI) in the service repo. These are generic, where the generic parameter is one of the previous models coming from our internal package.

So in theory we could "precompile" the schemas in the both places. But I would say precompiling the APIs parametrized generic models would suffice. Then we would have precompiled core schemas for each variant of API models (so GetEntities[Entity1], GetEntities[Entity2], etc. would all have their precompiled core schemas). That would remove the huge overhead of every time regenerating those on the fly when service starts and in tests.

There might be again some difficulties with FastAPI as it does its own dynamic model magic... It would need to be checked how it actually fits then with FastAPI. If you at some point have a prototype I can give it a go!

MarkusSintonen · 2024-03-06T06:46:09Z

You can see here how our service startup time is going up as time goes by (additional data models and API models added). Now its already over a minute where the core schemas building (in TypeAdapters) takes about a minute:

With Pydantic V1 the startup took about 10 seconds so the issue is getting worse with V2.

michaelgmiller1 · 2024-05-15T20:32:59Z

Is there a path forward to increase performance here? We still cannot upgrade to Pydantic v2 because the startup performance is an order of magnitude slower than v1.

MarkusSintonen added bug V2 Bug related to Pydantic V2 unconfirmed Bug not yet confirmed as valid/applicable labels Jul 20, 2023

pydantic-hooky bot assigned samuelcolvin Jul 20, 2023

samuelcolvin assigned adriangb and unassigned samuelcolvin Jul 20, 2023

samuelcolvin mentioned this issue Jul 24, 2023

Add config.defer_build to optionally make model building lazy #6823

Merged

5 tasks

This was referenced Aug 7, 2023

Fix config.defer_build for serialization first cases #7024

Merged

tweaks to _core_utils #7040

Merged

adriangb mentioned this issue Aug 9, 2023

Schema generation improvements #7054

Closed

This was referenced Aug 17, 2023

Optimize recursion detection by stopping on the second visit for the … #7160

Merged

V2 allocates over 1 GiB of virtual memory #7167

Closed

adriangb mentioned this issue Aug 28, 2023

After Migration initial import of Model object is 100 times slower #7263

Open

1 task

This was referenced Sep 14, 2023

Slightly improve schema generation performance #7449

Closed

Remove flattening of schemas before inlining and add schema walking caches #7473

Closed

davidhewitt removed the unconfirmed Bug not yet confirmed as valid/applicable label Oct 9, 2023

gabrieldemarmiesse mentioned this issue Feb 16, 2024

Slow import time gabrieldemarmiesse/python-on-whales#552

Open

mlarose mentioned this issue Feb 21, 2024

Pydantic 2 Upgrade CauseS Slowness In Dagster dagster-io/dagster#19658

Open

MarkusSintonen mentioned this issue Mar 4, 2024

Fix TypeAdapter to respect defer_build #8939

Merged

5 tasks

zholbrook-plaid mentioned this issue May 1, 2024

pyatlan.client.atlan takes ~7 seconds to import atlanhq/atlan-python#310

Open

Very slow (FastAPI) application startup after using V2 due to _core_utils.py:walk #6768

Very slow (FastAPI) application startup after using V2 due to _core_utils.py:walk #6768

Comments

MarkusSintonen commented Jul 20, 2023 • edited

Initial Checks

Description

Example Code

Python, Pydantic & OS Version

MarkusSintonen commented Jul 20, 2023 • edited

samuelcolvin commented Jul 20, 2023

samuelcolvin commented Jul 20, 2023

adriangb commented Jul 20, 2023

adriangb commented Jul 20, 2023

MarkusSintonen commented Jul 20, 2023 • edited

dmontagu commented Jul 20, 2023 • edited

MarkusSintonen commented Jul 20, 2023

dmontagu commented Jul 20, 2023

MarkusSintonen commented Jul 21, 2023

MarkusSintonen commented Jul 21, 2023 • edited

adriangb commented Jul 21, 2023

adriangb commented Jul 21, 2023

MarkusSintonen commented Jul 21, 2023 • edited

samuelcolvin commented Jul 21, 2023

adriangb commented Jul 21, 2023

MarkusSintonen commented Jul 21, 2023 • edited

adriangb commented Jul 21, 2023

MarkusSintonen commented Jul 21, 2023

adriangb commented Jul 21, 2023

MarkusSintonen commented Jul 21, 2023

MarkusSintonen commented Jul 22, 2023

adriangb commented Jul 22, 2023

MarkusSintonen commented Jul 28, 2023 • edited

HenriBlacksmith commented Aug 15, 2023 • edited

EDIT

dmontagu commented Aug 16, 2023

adriangb commented Sep 22, 2023

edgarhuichen commented Nov 29, 2023

adriangb commented Nov 29, 2023

edgarhuichen commented Nov 29, 2023

adriangb commented Nov 29, 2023

edgarhuichen commented Nov 29, 2023

michaelgmiller1 commented Feb 14, 2024

MarkusSintonen commented Mar 2, 2024

samuelcolvin commented Mar 3, 2024

MarkusSintonen commented Mar 3, 2024 • edited

MarkusSintonen commented Mar 3, 2024 • edited

samuelcolvin commented Mar 3, 2024

MarkusSintonen commented Mar 4, 2024

MarkusSintonen commented Mar 6, 2024

michaelgmiller1 commented May 15, 2024

Very slow (FastAPI) application startup after using V2 due to `_core_utils.py:walk` #6768

Very slow (FastAPI) application startup after using V2 due to `_core_utils.py:walk` #6768

MarkusSintonen commented Jul 20, 2023 •

edited

MarkusSintonen commented Jul 20, 2023 •

edited

MarkusSintonen commented Jul 20, 2023 •

edited

dmontagu commented Jul 20, 2023 •

edited

MarkusSintonen commented Jul 21, 2023 •

edited

MarkusSintonen commented Jul 21, 2023 •

edited

MarkusSintonen commented Jul 21, 2023 •

edited

MarkusSintonen commented Jul 28, 2023 •

edited

HenriBlacksmith commented Aug 15, 2023 •

edited

MarkusSintonen commented Mar 3, 2024 •

edited

MarkusSintonen commented Mar 3, 2024 •

edited