Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow (FastAPI) application startup after using V2 due to _core_utils.py:walk #6768

Open
1 task done
MarkusSintonen opened this issue Jul 20, 2023 · 44 comments
Open
1 task done
Assignees
Labels
bug V2 Bug related to Pydantic V2

Comments

@MarkusSintonen
Copy link
Contributor

MarkusSintonen commented Jul 20, 2023

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

Our FastAPI application startup performance degraded from about 5s to over 20s after using V2. This makes eg our test runs very slow (haven't tried how long it takes in prod to get servers up and running). Issue seems to be related to walking of core schemas as below cProfile results show (from our app init).

Issue seems to be coming from poor performance of initializing TypeAdapters in FastAPI. The issue might be also related on how FastAPI uses new TypeAdapter interface. But there might be some issue also in walking of core schemas, what do you think?
Full cProfile print output: cprofile.txt

See latest profiles here: #6768 (comment)

Below are outdated profiles, see link above.

(SLOW) Pydantic v2.0.3 & FastAPI v0.100.0
ncalls            tottime  percall cumtime   percall filename:lineno(function)
3854141/395376    3.575    0.000   18.577    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:193(_walk)
3850208/395376    3.395    0.000   20.191    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:190(walk)
3003272           2.298    0.000    2.644    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:106(get_ref)
9812              1.773    0.000   31.855    0.003   xxx/python3.11/site-packages/pydantic/type_adapter.py:142(__init__)
1352590/179311    1.712    0.000    6.842    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:434(flatten_refs)
309655/256580     1.587    0.000   12.204    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:309(handle_model_fields_schema)
1145758/38773     1.536    0.000    6.745    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:411(collect_refs)
5766050           1.451    0.000    1.451    0.000   {method 'copy' of 'dict' objects}
3229655/682916    1.382    0.000   16.172    0.000   xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:200(_handle_other_schemas)
9718277           1.131    0.000    1.131    0.000   {method 'get' of 'dict' objects}
83003             0.831    0.000    3.959    0.000   xxx/python3.11/site-packages/pydantic/_internal/_generate_schema.py:1288(_get_prepare_pydantic_annotations_for_known_type)


(FAST) Pydantic v1.10.10 & FastAPI v0.100.0
ncalls            tottime  percall  cumtime  percall filename:lineno(function)
871294/78525      2.790    0.000    3.005    0.000   {built-in method _abc._abc_subclasscheck}
545241/8813       0.775    0.000    1.942    0.000   xxx/python3.11/copy.py:128(deepcopy)
10618             0.490    0.000    1.593    0.000   xxx/python3.11/site-packages/fastapi/utils.py:63(create_response_field)
19486/917         0.377    0.000    1.906    0.002   xxx/python3.11/copy.py:227(_deepcopy_dict)
3202/2127         0.321    0.000    3.182    0.001   xxx/python3.11/site-packages/pydantic/generics.py:75(__class_getitem__)
40290/28472       0.313    0.000    1.657    0.000   xxx/python3.11/inspect.py:2428(_signature_from_callable)
28472             0.285    0.000    0.979    0.000   xxx/python3.11/inspect.py:2333(_signature_from_function)
871294/78525      0.240    0.000    3.033    0.000   <frozen abc>:121(__subclasscheck__)
1422963/1412127   0.231    0.000    0.244    0.000   {built-in method builtins.isinstance}
52781             0.213    0.000    0.265    0.000   xxx/python3.11/inspect.py:2972(__init__)
14998             0.205    0.000    0.206    0.000   {method '__reduce_ex__' of 'object' objects}


(FAST) Pydantic v1.10.10 & FastAPI v0.99.1
ncalls            tottime  percall  cumtime  percall filename:lineno(function)
858955/76183      1.997    0.000    2.190    0.000   {built-in method _abc._abc_subclasscheck}
538585/2157       0.825    0.000    1.788    0.001   xxx/python3.11/copy.py:128(deepcopy)
23230             0.509    0.000    0.521    0.000   xxx/python3.11/site-packages/fastapi/dependencies/models.py:16(__init__)
52781             0.425    0.000    0.476    0.000   xxx/python3.11/inspect.py:2972(__init__)
10618             0.419    0.000    1.480    0.000   xxx/python3.11/site-packages/fastapi/utils.py:75(create_response_field)
19486/917         0.367    0.000    1.776    0.002   xxx/python3.11/copy.py:227(_deepcopy_dict)
3202/2127         0.312    0.000    3.025    0.001   xxx/python3.11/site-packages/pydantic/generics.py:75(__class_getitem__)
40290/28472       0.279    0.000    1.778    0.000   xxx/python3.11/inspect.py:2428(_signature_from_callable)
28472             0.265    0.000    1.148    0.000   xxx/python3.11/inspect.py:2333(_signature_from_function)
858955/76183      0.219    0.000    2.216    0.000   <frozen abc>:121(__subclasscheck__)
1360643/1349807   0.212    0.000    0.224    0.000   {built-in method builtins.isinstance}

Example Code

# This TypeAdapter usage is very slow with large amount of API models (eg API models are generic over inner data models)
# https://github.com/tiangolo/fastapi/blob/f7e3559bd5997f831fb9b02bef9c767a50facbc3/fastapi/_compat.py#L101
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
    Annotated[self.field_info.annotation, self.field_info]
)

Python, Pydantic & OS Version

pydantic version: 2.0.3
        pydantic-core version: 2.3.0 release build profile
                 install path: /Users/markussintonen/Library/Caches/pypoetry/virtualenvs/userdata-IiJ3qKwC-py3.11/lib/python3.11/site-packages/pydantic
               python version: 3.11.4 (main, Jul 17 2023, 14:44:40) [Clang 14.0.3 (clang-1403.0.22.14.1)]
                     platform: macOS-13.4.1-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

Selected Assignee: @samuelcolvin

@MarkusSintonen MarkusSintonen added bug V2 Bug related to Pydantic V2 unconfirmed Bug not yet confirmed as valid/applicable labels Jul 20, 2023
@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 20, 2023

When I try to patch FastAPI so it memoizes the FieldInfo->TypeAdapter mapping, it reduces the initialization time from ~20s to ~10s. So it seems its doing some unnecessary work there. But its still spending lot of the time in walk.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1682626/144308    1.895    0.000    8.208    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:193(_walk)
1678693/144308    1.617    0.000    8.903    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:190(walk)
139877/116568    0.831    0.000    5.860    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:309(handle_model_fields_schema)
566346/69584    0.695    0.000    3.330    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:434(flatten_refs)
    71062    0.582    0.000    0.606    0.000 xxx/python3.11/enum.py:685(__call__)
  2458408    0.563    0.000    0.563    0.000 {method 'copy' of 'dict' objects}
482385/10384    0.542    0.000    2.407    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:411(collect_refs)
1425888/284906    0.523    0.000    7.021    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:200(_handle_other_schemas)
  4186137    0.476    0.000    0.476    0.000 {method 'get' of 'dict' objects}
  1261789    0.439    0.000    0.581    0.000 xxx/python3.11/site-packages/pydantic/_internal/_core_utils.py:106(get_ref)
     1013    0.407    0.000    8.590    0.008 xxx/python3.11/site-packages/pydantic/_internal/_model_construction.py:393(complete_model_class)

Here is the patch I did for memoization:

# Ugly hacks follow to make FieldInfo hashable and monkey patching FastAPI
def hashable(obj):
    if isinstance(obj, typing.Hashable):
        return obj
    elif isinstance(obj, dict):
        return tuple(sorted(((k, hashable(v)) for k, v in obj.items()), key=lambda t: t[0]))
    elif isinstance(obj, typing.Iterable):
        return tuple(hashable(item) for item in obj)
    else:
        raise TypeError(type(obj))


class FieldInfoKey:
    def __init__(self, field_info: FieldInfo):
        self.field_info = field_info
        self.field_info_hashable = hashable(field_info._attributes_set)

    def __hash__(self):
        return hash(self.field_info_hashable)

    def __eq__(self, other):
        if not isinstance(other, FieldInfoKey):
            return False
        return self.field_info._attributes_set == other.field_info._attributes_set


@weak_cache
def type_adapter_cache(f: FieldInfoKey) -> TypeAdapter:
    return TypeAdapter(Annotated[f.field_info.annotation, f.field_info])


def ugly_init_patch(self):
    self._field_info_key = FieldInfoKey(self.field_info)
    self._type_adapter = type_adapter_cache(self._field_info_key)

from fastapi._compat import ModelField
ModelField.__post_init__ = ugly_init_patch

Maybe FastAPIs _compat.ModelField could also create TypeAdapters lazily, but it could instead just hide issues in walk...? (Lazy initialization would also cause issues in FastAPI route definitions validation. So badly defined route would only break on usage)

@samuelcolvin
Copy link
Member

could be related to #6748

@samuelcolvin
Copy link
Member

Thanks so much for reporting, we're looking into this.

@adriangb
Copy link
Member

Can you give us a sample of your models? For example, the model from your slowest endpoint with the field names and model names mangled or something like that.

@adriangb
Copy link
Member

I do think FastAPI or Pydantic should cache TypeAdapters to save not only this walking but also creating the SchemaValidator and such. There's memory benefits to that as well.

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 20, 2023

Can you give us a sample of your models? For example, the model from your slowest endpoint with the field names and model names mangled or something like that.

It seems there isnt any specific model that would be slow. There are just so many of unique request/response models from generic parametrization over different data models we have. And each model takes a bunch of time to walk. I took a sample of where walk actually spends time like below. (Note that this patches FastAPI so it caches TypeAdapters. So we only calculate work for unique model combos created by FastAPI routes.)

def patch_walk():
    orig = _WalkCoreSchema.walk
    durations = {}

    def patch(self, schema, *args, **kwargs):
        key = schema["type"] + "_" + str(schema.get("cls") or schema.get("model_name") or "<unknown>")
        start = time()
        res = orig(self, schema, *args, **kwargs)
        durations[key] = durations.get(key, 0) + (time() - start)

        return res

    _WalkCoreSchema.walk = patch
    return durations


def init_app_stuff():
    ModelField.__post_init__ = ugly_caching_init_patch  # Cache slow TypeAdapters in FastAPI
    durations = patch_walk()

    # do FastAPI app init...

    print(f"ENTRIES COUNT: {len(durations)}")
    print(f"TOTAL DURATION: {sum(durations.values())}")
    top_kvs = [*sorted(((k, v) for k, v in durations.items()), key=lambda t: t[1], reverse=True)][0:100]
    print("TOP:")
    print(json.dumps(dict(top_kvs), indent=2))

Yielding:

ENTRIES COUNT: 2639
TOTAL DURATION: 15.628021955490112
TOP:
{
  "definition-ref:<unknown>": 2.6756765842437744,
  "default:<unknown>": 1.8995928764343262,
  "nullable:<unknown>": 1.1915950775146484,
  "function-before:<unknown>": 1.0592408180236816,
  "list:<unknown>": 0.6754055023193359,
  "function-after:<unknown>": 0.41553235054016113,
  "str:<unknown>": 0.34151172637939453,
  "model-fields:DataModelHavingNestedDictFields": 0.3339402675628662,
  "int:<unknown>": 0.304196834564209,
  "model-fields:VerySmallDataModel": 0.28247642517089844,
  "model-fields:DataModelHavingStringAndEnumFields": 0.26870179176330566,
  "model:<class 'DataModelHavingStringAndEnumFields'>": 0.24478888511657715,
  "model-fields:ResponseModel[DataModelHavingNumericFieldsAndCoupleNestedModels]": 0.22575044631958008,
  "model:<class 'VerySmallDataModel'>": 0.19563722610473633,
  "json-or-python:<unknown>": 0.1840825080871582,
  "chain:<unknown>": 0.14197635650634766,
  "lax-or-strict:<unknown>": 0.11188220977783203,
  "model-fields:AnotherVerySmallDataModel": 0.10825538635253906,
  "model-fields:ModelHavingLotOfDifferentFields": 0.0886530876159668,
  "model-fields:Model2HavingLotOfDifferentFields": 0.08829522132873535,
  "model-fields:SimpleResponseModel": 0.07635879516601562,
  "dict:<unknown>": 0.07629108428955078,
  "model-fields:ModelHavingVaryingFields": 0.07033467292785645,
  "model-fields:SmallResponseModel": 0.0696713924407959,
  "literal:<unknown>": 0.0684511661529541,
  "model-fields:SmallErrorResponseModel": 0.0679781436920166,
  "model-fields:AnotherSmallErrorResponseModel": 0.05967593193054199,
  "definitions:<unknown>": 0.059203147888183594,

  // Lots of stuff but nothing really interesting...

  "model-fields:ModelHavingNestedModelsWithNestedModels": 0.013972759246826172,
  "model-fields:AnotherModelHavingNestedModelsWithNestedModels": 0.013957023620605469,
  "model:<class 'SimpleModelWithIntFields'>": 0.01356363296508789,
  "model-fields:SimpleResponseModel[DataModelWithLotOfFields]": 0.013484001159667969,
  "model-fields:SomeModelHavingNestedModelsWithNestedModel": 0.013458251953125,
  "any:<unknown>": 0.013149023056030273,
  "model-fields:AnotherSimpleResponseModel[ModelWIthLotOfFields]": 0.012750387191772461,
  "model-fields:SimpleModel": 0.012310504913330078,
  "dataclass:<class 'ErrorResponseModel'>": 0.011614799499511719,
  "model:<class 'AnotherErrorResponseModel'>": 0.011425018310546875,
  "model-fields:OneResponseModel": 0.011368989944458008,
  "model-fields:SimpleModel": 0.011211872100830078,
  "model-fields:AnotherSimpleModel": 0.011202573776245117,
}

@dmontagu
Copy link
Contributor

dmontagu commented Jul 20, 2023

While I doubt it will be "easy" to improve this, I do think there's likely to be a lot of potential for improvement because performance of model/TypeAdapter instantiation hasn't really been a concern up to this point (though clearly we want to improve your situation).

@MarkusSintonen I've pushed a branch to pydantic called lazy-type-adapter which makes type adapters instantiate the core schemas and validators/serializers lazily. Is there any chance you could try installing pydantic from that branch and see if it improves your app startup time? I don't think that branch is currently viable for merging since it won't (yet) trigger build-time errors in all cases where it should, BUT, I think the extent to which that branch impacts startup performance will significantly help me understand where there is potential for improvements.

Note also that that branch may not actually improve total startup time for accessing all endpoints, I just think it may delay per-model/endpoint initialization until the first request that hits that endpoint, which I'm thinking might result in a significant improvement to startup time (i.e., time it takes to get a single response from an arbitrary endpoint). I'd be interested to understand if that is not a sufficient improvement for you (i.e., you need to keep total initialization time low even if you can get a response from some endpoint very fast), but either way it will be useful to understand the impact.

I think Adrian and I also have some ideas about how we can do better caching in the schema generation process that should reduce the amount of work being done by the TypeAdapter; that's not quite ready, but improving this is a priority for us.

@MarkusSintonen
Copy link
Contributor Author

I've pushed a branch to pydantic called lazy-type-adapter which makes type adapters instantiate the core schemas and validators/serializers lazily. Is there any chance you could try installing pydantic from that branch and see if it improves your app startup time?

I tried that and it indeed helps a lot! Startup time dropped from ~20s to ~6s, very promising! (Didn't need to patch FastAPI anymore to cache TypeAdapters) Heres new cProfile: cprofile.txt

@dmontagu
Copy link
Contributor

@MarkusSintonen okay great, good to know. Like I said, I don't quite think we can take that route, but good to hear it's a step in the right direction.

One thing @adriangb and I discovered — FastAPI is actually creating another type adapter each time you include a route in another route. As a result, there's a lot of potentially-unnecessary overhead if you are using nested API routers; even if you just add endpoints to routers and then add the routers to the app, that's probably a ~2x overhead that could be removed if we rework FastAPI to reuse the type adapter when possible. I'll look into this, but are you using much nesting of routers? I can see in your cProfile that you created ~1500 instances of APIRouter (which resulted in ~10k type adapters being created), but that seems to me like you probably don't actually have that many distinct endpoints, in which case there might be some low-hanging fruit there too.

@MarkusSintonen
Copy link
Contributor Author

FastAPI is actually creating another type adapter each time you include a route in another route. As a result, there's a lot of potentially-unnecessary overhead

@dmontagu yes I also observed that as you can see in the above message. So when I make FastAPI to cache the FieldInfo->TypeAdapter mapping it reduces the startup time by about 50%. So it takes about 10s to startup (Still 2x compared to Pydantic V1)

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 21, 2023

It would be easier to open an improvement PR to FastAPI to reuse TypeAdapters if Pydantic would make FieldInfo properly hashable/immutable. Then it would be possible to make a cache in FastAPI eg WeakKeyDictionary[FieldInfo, TypeAdapter]

@adriangb
Copy link
Member

The issue with that is that FieldInfo can contain stuff like arbitrary JSON, which you demonstrate above can be made to be hashable but I'm not sure that sort of trick will extend to all of the data in FieldInfo.

@adriangb
Copy link
Member

To summarize so far:

  • Things seem to be more expensive now than they were in v1. I wouldn't be surprised if that's just the reality of the matter but David did mention that some things in v1 were being done lazily, so maybe we just moved work from one place to another. I usually like things in my web apps to not be lazy because I'd rather have a 10s start than random 500ms latency for users.
  • FastAPI re-creates every route (and every TypeAdapter) when mounting. This seems like it should be an entirely avoidable problem with some refactoring in FastAPI. What I did in Xpresso was defer work like this until the first of (1) ASGI lifespan or (2) the first request. This meant that for most applications running Uvicorn in production work happened before the first request (so no random latency for users) but no work happened when importing and the lifespan isn't required (which makes tests and development not painful). A combination of this and not re-loading the world just for mounting ASGI apps would make a huge difference for FastAPI. @tiangolo and I have talked about using Starlette's Mount in the past, I believe the blocker for that was some breaking changes in behavior, but maybe now with Starlette 1.0 around the corner would be the time for FastAPI to bite the bullet on that sort of thing.
  • Reducing the number of TypeAdapters per route. The 1:1 mapping of TypeAdapter to endpoint is nice, but in theory they could all get squashed into a single type / TypeAdapter and processed at the same time. For various reasons this would be faster.
  • Improving the performance of the stuff Pydantic does internally. Basically improving the performance of creating a TypeAdapter. I think we'd mainly have to do this via caching. This is non-trivial because e.g. the CoreSchema is just a dict of dicts/lists, so if we store some cache metadata on the topmost level there's no way to do cache invalidation if something edits an internal spot.

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 21, 2023

I usually like things in my web apps to not be lazy because I'd rather have a 10s start than random 500ms latency for users.

Yeah I mostly agree. Currently its just a large jump backwards when developing (large applications) with PydanticV2/FastAPI. As getting the test running takes (without any fixes/hacks) 5x longer. This is only locally so Im not sure at all how long it would take in production to startup after deploys. Could be now close to a ~minute. (Also unknown how much it would actually consumer memory there, havent gotten there yet).

FastAPI re-creates every route (and every TypeAdapter) when mounting

Its quite clear that FastAPI is also doing things very suboptimally now with TypeAdapters. As with the simple patch it can be reduced quite a lot (but its quite a bit slower). Not sure how feasible the weak-dict like patch is, as FieldInfos are not easily hashable (not sure either could something in FastAPI mutate FieldInfos on the fly...)

Improving the performance of the stuff Pydantic does internally. Basically improving the performance of creating a TypeAdapter. I think we'd mainly have to do this via caching.

Hopefully there would also fixes to logic how TypeAdapter creates the CoreSchemas. Maybe there is some unneeded repeated deep traversal of the dicts. Probably some of the work could be performed while constructing the CoreSchema instead of repeatedly retraversing the whole object tree. Or moving some traversal to Rust side :D

@samuelcolvin
Copy link
Member

@MarkusSintonen any chance you'd be willing to jump on a call with some of the Pydantic team to tell us a bit about your application and how you're using Pydantic? If so, drop me an email to samuel@pydantic.dev.

@adriangb
Copy link
Member

We've reached out to @tiangolo to see what we can do on the FastAPI side.

In the meantime I've been trying to optimize what we do on the Pydantic side. @MarkusSintonen could you try this branch and see if it makes any difference? I'm not sure that it will but there's a chance.

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 21, 2023

could you try this branch and see if it makes any difference

Sure, I'll try later today! All in all, I think if we get FastAPI to do TypeAdapter reusing, its already a great improvement. 10sec startup in tests is livable. The overall improvement from Pydantic2 is so great! Thank you all!

chance you'd be willing to jump on a call with some of the Pydantic

Sure I'll drop you message later today

@adriangb
Copy link
Member

10sec startup in tests is livable

Would 0s (give or take, if we did lazy startup) in tests and 10s in production be even better?

@MarkusSintonen
Copy link
Contributor Author

10sec startup in tests is livable

Would 0s (give or take, if we did lazy startup) in tests and 10s in production be even better?

Even better :D (probably some level of breaking change for FastAPI)

@adriangb
Copy link
Member

I don't think it needs to be!

@MarkusSintonen
Copy link
Contributor Author

In the meantime I've been trying to optimize what we do on the Pydantic side. MarkusSintonen could you try this branch and see if it makes any difference? I'm not sure that it will but there's a chance.

@adriangb Hmm interestingly its slower. So Pydantic 2.0.3 takes ~20s and this branch takes ~30s (without FastAPI patches). Here is (partial) cProfile: cprofile.txt

@MarkusSintonen
Copy link
Contributor Author

@adriangb I wrote a test for you that tries to emulate the performance issue:
test_perf.py.txt
Eg it gives similar cProfile results. You can try it out on investigations. Its kinda similar on how generics get used on our application with FastAPIs API models. (But all else is just emulated and not really anything thats actually used).

@adriangb
Copy link
Member

Awesome that is super helpful!

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 28, 2023

@samuelcolvin / @adriangb FYI we are testing PydanticV2 on our EKS cluster. It has a very noticeable impact on the overall memory usage. Memory usage increases by over 2x. I guess it might be related to the creation of core-schemas and FastAPI initializing them all at startup?

Näyttökuva 2023-7-28 kello 10 44 10

It doesnt seem like a memory leak as we are not seeing it creeping upwards. So probably caused by initialization of all the internal structures.

@HenriBlacksmith
Copy link

HenriBlacksmith commented Aug 15, 2023

I did not dig into the profiling in my case, but just to confirm that @MarkusSintonen is not alone, I attempted a migration to Pydantic v2 on the application I am working on and tests durations exploded from 15 mins to 1h30.

The slowest tests cases moved from around 15s to 20s and the fastest tests were around 1ms and increased to 700ms.

I cannot be specific about the application but It is using FastAPI and Pydantic with many models including some generic ones (for Paginated API responses for instance).

For now this is a blocker to move to Pydantic v2.

Versions before (ok):

fastapi==0.101.1
pydantic[dotenv,email]==1.10.12

Versions after (nok):

fastapi==0.101.1
pydantic[email]==2.1.1
pydantic-core==2.4.0
pydantic-settings==2.0.2

EDIT

I found a way to circumvent the issue and accelerate my tests by removing the cold startup time of FastAPI by using a (almost) session scoped fixture for the FastAPI application.

The only tests which require to modify the application configuration (Feature flags for instance) will have to go through the startup though.

@dmontagu
Copy link
Contributor

Yeah, @HenriBlacksmith I think the issue is that there is a lot more happening during app creation than used to, it makes sense to me that a using a scoped fixture for the fastapi app would make things MUCH faster for you.

That said, we are still trying to find ways to speed things up by reducing the amount of work being done eagerly to prepare core schemas etc.

@adriangb
Copy link
Member

We've actually fully resolved this now. On the benchmark from this issue, startup time has gone from about 11s on my laptop down to 2.5s, so back to v1 levels of performance. There were no compromises / observable changes made (aside from possibly some changes to the generated CoreSchemas), just simplifying code and adding some caches.

It can be even faster if you set the PYDANTIC_SKIP_VALIDATING_CORE_SCHEMAS env var being introduced in #7565. Most schemas are valid and do not need validation, and if you are starting up your FastAPI app in tests then you're already validating it there, it doesn't need to be re-validated multiple times. So you could set that env var by default to 'true' (in conftest.py, in your deployment env vars) and then override it to 'false' in a single test responsible for validating your schemas. That way even your tests will run faster.

@MarkusSintonen I'd appreciate if you can test the next release (or main) and let us know if we can close this issue.

@davidhewitt davidhewitt removed the unconfirmed Bug not yet confirmed as valid/applicable label Oct 9, 2023
@edgarhuichen
Copy link

Well, I still experience problems with startup time in a container. It takes more than 5 seconds for the app to return a simple request. The app is deployed in the Azure Container Apps.

@adriangb
Copy link
Member

Was it faster than 5s on v1? Do you have anything we can run to test this performance?

@edgarhuichen
Copy link

Was it faster than 5s on v1? Do you have anything we can run to test this performance?

I did not test performance on v1. A bit more context: the ACA is by default scaled to 0 instance. When I pin the probe endpoint, it sometimes takes very long for the app to respond. This is also causing scaling issues because of the use case I have. One thing worth mentioning: I have created a app user to run the container. Not sure if this will degrade the performance.

@adriangb
Copy link
Member

I'm not sure about your deployment topology but as per the benchmarks above you'd have to have some very complex setup with hundreds of generic models or something like that for it to take 5s. Or do you have a simple example we can reproduce with?

@edgarhuichen
Copy link

I'm not sure about your deployment topology but as per the benchmarks above you'd have to have some very complex setup with hundreds of generic models or something like that for it to take 5s. Or do you have a simple example we can reproduce with?

Hi Adrian, just tested starting up the app locally and it takes approximately 3 seconds each time. So it's ok. I guess the issue is on Azure side. Thanks for your attention. I will contact Azure team for help.

@michaelgmiller1
Copy link

@dmontagu @adriangb any ideas on if there are any big performance wins in store for V2? We are trying to upgrade from V1 and are seeing 3-4x slowdowns on app startup. We have lots of nested and recursive models, and even with 2.6.1, we are still seeing slowness.

@MarkusSintonen
Copy link
Contributor Author

Hi @adriangb / @samuelcolvin!

Sorry I haven't replied sooner. I was on a parental leave 👶

I tested 2.6.3 version of Pydantic against our code base with PYDANTIC_SKIP_VALIDATING_CORE_SCHEMAS=true env and defer_build=True with all our main models. But it does not seem to make a difference. Maybe just a marginal difference of a few seconds. 😔 So without and with those params its ~36s vs ~34s.

Checked the profiler again and it still looks pretty much the same. Huge amount of time is spent in core schema building. Like 90% of the test startup time goes there. Maybe FastAPI doesn't work at all with defer_build=True? That should lazily build the core schema when validation is actually used? But maybe it doesnt work with TypeAdapter+Annotated types (like FastAPI uses).

Some more info about the setup:

We are using session fixture for our FastAPI instance so slowness is only at the test start up. After that tests run quickly but ofcourse its cumbersome now to write/use the API tests.
In our production the startup slowness with core schema initialization is also very visible. It takes 40-50s to get the service started. Therefore in our HPA Kubernetes setup the scale up is quite delayed and spiky load can causes issues.

@samuelcolvin
Copy link
Member

Hi @MarkusSintonen, I'm sorry you're still having trouble.

I see a few ways forward (not necessarily mutually exclusive):

  • With the new pyo3 API which should improve the performance of accessing & creating python objects in rust a lot, we can maybe move some of the walk logic into Rust, there's already a pr for this
  • Rework walk and make it faster without changing the high-level logic
  • reimplement walk from scratch to not walk core schema so many times
  • cache core schemas, maybe even in a file
  • see if there are fixes possible in FastAPI to not create schemas so many times, I'm not optimistic this will get accepted

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Mar 3, 2024

@samuelcolvin

Some more info: I verified if the cProfile in the original PR description actually reflects the actual time spent on different parts of core schema generation. It seems cProfile in this case causes a huge measurement error due to how highly recursive the core schema "walking" is. I instead manually instrumented the GenerateSchema.generate_schema, GenerateSchema.clean_schema and _core_utils.walk_core_schema to see how much time is actually spent inside these.

It actually seems its the GenerateSchema.generate_schema that takes most of the time in our case. At initialization 74% of time is spent there! Only 12% of time is spent in _core_utils.walk_core_schema.

Total time to startup 36.80s

GenerateSchema.generate_schema 27.30s
_core_utils.walk_core_schema   4.39s
GenerateSchema.clean_schema    4.78s  (this calls _core_utils.walk_core_schema)

Then I cProfiled just the heaviest GenerateSchema.generate_schema which gave following results for functions internal and cumulative times:
cprofile_time.txt
cprofile_cumulative.txt

The profiler doesn't show any obvious spot taking most of the time. Its just doing so much overall so it starts to add up with lot of models.

I hope this helps!

(Above tests used defer_build and PYDANTIC_SKIP_VALIDATING_CORE_SCHEMAS. But they dont seem to bring much of a difference)

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Mar 3, 2024

Another idea: would it be somehow possible to "precompile" the core schemas? That would remove all the overhead of regenerating those every time on every startup as models don't really change that often (unless someone uses some kind of dynamic model, then precompiling wouldn't work). It would require exposing some kind of "exporting" of core schemas and then "importing" those into the Pydantic models. Then users could decide how exactly they get distributed and kept up-to-date. Maybe there could be some validation in "importing" to check the core schema matches the model its imported to (but again validating too much might make it slow again...). Looking at contents of core schemas there are the custom validator function callbacks and some other callbacks also there. Those would need to be handled at "import" time so the actual callbacks are added there to the core schema dicts, etc.

@samuelcolvin
Copy link
Member

It actually seems its the GenerateSchema.generate_schema that takes most of the time in our case. At initialization 74% of time is spent there! Only 12% of time is spent in _core_utils.walk_core_schema.

That's really interesting, and saves me a lot of time.

With that we might be able to look into the generate_schema performance and see what we can do.

would it be somehow possible to "precompile" the core schemas?

Ye, we've thought about that - it's roughly what I meant (although I wasn't clear at all) by "cache core schemas, maybe even in a file".

I think this could work - the hard is bit might be working out which of the pre-built schemas TypAdapter or BaseModel should use - presumably we need them to automatically look up the stored schemas as use them so no changes to FastAPI required (we can talk to @tiangolo about supporting this in FastAPI, but it'll be quicker to implement if that's not required.

If the schemas are always pydantic models in the root of a module, I guess the key can be as simple as {module_name}:{model_name} - I guess we start by supporting the simple case and extend from there.

How are you schemas defined?

@MarkusSintonen
Copy link
Contributor Author

If the schemas are always pydantic models in the root of a module, I guess the key can be as simple as {module_name}:{model_name} - I guess we start by supporting the simple case and extend from there.
How are you schemas defined?

Our Pydantic models are mainly defined in two places: Our internal Python package having "entity" data models (lots of these with varying complexity). Then we have our API models (FastAPI) in the service repo. These are generic, where the generic parameter is one of the previous models coming from our internal package.

So in theory we could "precompile" the schemas in the both places. But I would say precompiling the APIs parametrized generic models would suffice. Then we would have precompiled core schemas for each variant of API models (so GetEntities[Entity1], GetEntities[Entity2], etc. would all have their precompiled core schemas). That would remove the huge overhead of every time regenerating those on the fly when service starts and in tests.

There might be again some difficulties with FastAPI as it does its own dynamic model magic... It would need to be checked how it actually fits then with FastAPI. If you at some point have a prototype I can give it a go!

@MarkusSintonen
Copy link
Contributor Author

You can see here how our service startup time is going up as time goes by (additional data models and API models added). Now its already over a minute where the core schemas building (in TypeAdapters) takes about a minute:
Näyttökuva 2024-3-6 kello 8 42 50

With Pydantic V1 the startup took about 10 seconds so the issue is getting worse with V2.

@michaelgmiller1
Copy link

Is there a path forward to increase performance here? We still cannot upgrade to Pydantic v2 because the startup performance is an order of magnitude slower than v1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2
Projects
None yet
Development

No branches or pull requests

8 participants