-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow (FastAPI) application startup after using V2 due to _core_utils.py:walk
#6768
Comments
When I try to patch FastAPI so it memoizes the
Here is the patch I did for memoization: # Ugly hacks follow to make FieldInfo hashable and monkey patching FastAPI
def hashable(obj):
if isinstance(obj, typing.Hashable):
return obj
elif isinstance(obj, dict):
return tuple(sorted(((k, hashable(v)) for k, v in obj.items()), key=lambda t: t[0]))
elif isinstance(obj, typing.Iterable):
return tuple(hashable(item) for item in obj)
else:
raise TypeError(type(obj))
class FieldInfoKey:
def __init__(self, field_info: FieldInfo):
self.field_info = field_info
self.field_info_hashable = hashable(field_info._attributes_set)
def __hash__(self):
return hash(self.field_info_hashable)
def __eq__(self, other):
if not isinstance(other, FieldInfoKey):
return False
return self.field_info._attributes_set == other.field_info._attributes_set
@weak_cache
def type_adapter_cache(f: FieldInfoKey) -> TypeAdapter:
return TypeAdapter(Annotated[f.field_info.annotation, f.field_info])
def ugly_init_patch(self):
self._field_info_key = FieldInfoKey(self.field_info)
self._type_adapter = type_adapter_cache(self._field_info_key)
from fastapi._compat import ModelField
ModelField.__post_init__ = ugly_init_patch Maybe FastAPIs |
could be related to #6748 |
Thanks so much for reporting, we're looking into this. |
Can you give us a sample of your models? For example, the model from your slowest endpoint with the field names and model names mangled or something like that. |
I do think FastAPI or Pydantic should cache |
It seems there isnt any specific model that would be slow. There are just so many of unique request/response models from generic parametrization over different data models we have. And each model takes a bunch of time to def patch_walk():
orig = _WalkCoreSchema.walk
durations = {}
def patch(self, schema, *args, **kwargs):
key = schema["type"] + "_" + str(schema.get("cls") or schema.get("model_name") or "<unknown>")
start = time()
res = orig(self, schema, *args, **kwargs)
durations[key] = durations.get(key, 0) + (time() - start)
return res
_WalkCoreSchema.walk = patch
return durations
def init_app_stuff():
ModelField.__post_init__ = ugly_caching_init_patch # Cache slow TypeAdapters in FastAPI
durations = patch_walk()
# do FastAPI app init...
print(f"ENTRIES COUNT: {len(durations)}")
print(f"TOTAL DURATION: {sum(durations.values())}")
top_kvs = [*sorted(((k, v) for k, v in durations.items()), key=lambda t: t[1], reverse=True)][0:100]
print("TOP:")
print(json.dumps(dict(top_kvs), indent=2)) Yielding:
|
While I doubt it will be "easy" to improve this, I do think there's likely to be a lot of potential for improvement because performance of model/TypeAdapter instantiation hasn't really been a concern up to this point (though clearly we want to improve your situation). @MarkusSintonen I've pushed a branch to pydantic called Note also that that branch may not actually improve total startup time for accessing all endpoints, I just think it may delay per-model/endpoint initialization until the first request that hits that endpoint, which I'm thinking might result in a significant improvement to startup time (i.e., time it takes to get a single response from an arbitrary endpoint). I'd be interested to understand if that is not a sufficient improvement for you (i.e., you need to keep total initialization time low even if you can get a response from some endpoint very fast), but either way it will be useful to understand the impact. I think Adrian and I also have some ideas about how we can do better caching in the schema generation process that should reduce the amount of work being done by the TypeAdapter; that's not quite ready, but improving this is a priority for us. |
I tried that and it indeed helps a lot! Startup time dropped from ~20s to ~6s, very promising! (Didn't need to patch FastAPI anymore to cache TypeAdapters) Heres new cProfile: cprofile.txt |
@MarkusSintonen okay great, good to know. Like I said, I don't quite think we can take that route, but good to hear it's a step in the right direction. One thing @adriangb and I discovered — FastAPI is actually creating another type adapter each time you include a route in another route. As a result, there's a lot of potentially-unnecessary overhead if you are using nested API routers; even if you just add endpoints to routers and then add the routers to the app, that's probably a ~2x overhead that could be removed if we rework FastAPI to reuse the type adapter when possible. I'll look into this, but are you using much nesting of routers? I can see in your cProfile that you created ~1500 instances of APIRouter (which resulted in ~10k type adapters being created), but that seems to me like you probably don't actually have that many distinct endpoints, in which case there might be some low-hanging fruit there too. |
@dmontagu yes I also observed that as you can see in the above message. So when I make FastAPI to cache the FieldInfo->TypeAdapter mapping it reduces the startup time by about 50%. So it takes about 10s to startup (Still 2x compared to Pydantic V1) |
It would be easier to open an improvement PR to FastAPI to reuse TypeAdapters if Pydantic would make |
The issue with that is that FieldInfo can contain stuff like arbitrary JSON, which you demonstrate above can be made to be hashable but I'm not sure that sort of trick will extend to all of the data in FieldInfo. |
To summarize so far:
|
Yeah I mostly agree. Currently its just a large jump backwards when developing (large applications) with PydanticV2/FastAPI. As getting the test running takes (without any fixes/hacks) 5x longer. This is only locally so Im not sure at all how long it would take in production to startup after deploys. Could be now close to a ~minute. (Also unknown how much it would actually consumer memory there, havent gotten there yet).
Its quite clear that FastAPI is also doing things very suboptimally now with TypeAdapters. As with the simple patch it can be reduced quite a lot (but its quite a bit slower). Not sure how feasible the weak-dict like patch is, as FieldInfos are not easily hashable (not sure either could something in FastAPI mutate FieldInfos on the fly...)
Hopefully there would also fixes to logic how TypeAdapter creates the CoreSchemas. Maybe there is some unneeded repeated deep traversal of the dicts. Probably some of the work could be performed while constructing the CoreSchema instead of repeatedly retraversing the whole object tree. Or moving some traversal to Rust side :D |
@MarkusSintonen any chance you'd be willing to jump on a call with some of the Pydantic team to tell us a bit about your application and how you're using Pydantic? If so, drop me an email to samuel@pydantic.dev. |
We've reached out to @tiangolo to see what we can do on the FastAPI side. In the meantime I've been trying to optimize what we do on the Pydantic side. @MarkusSintonen could you try this branch and see if it makes any difference? I'm not sure that it will but there's a chance. |
Sure, I'll try later today! All in all, I think if we get FastAPI to do TypeAdapter reusing, its already a great improvement. 10sec startup in tests is livable. The overall improvement from Pydantic2 is so great! Thank you all!
Sure I'll drop you message later today |
Would 0s (give or take, if we did lazy startup) in tests and 10s in production be even better? |
Even better :D (probably some level of breaking change for FastAPI) |
I don't think it needs to be! |
@adriangb Hmm interestingly its slower. So Pydantic 2.0.3 takes ~20s and this branch takes ~30s (without FastAPI patches). Here is (partial) cProfile: cprofile.txt |
@adriangb I wrote a test for you that tries to emulate the performance issue: |
Awesome that is super helpful! |
@samuelcolvin / @adriangb FYI we are testing PydanticV2 on our EKS cluster. It has a very noticeable impact on the overall memory usage. Memory usage increases by over 2x. I guess it might be related to the creation of core-schemas and FastAPI initializing them all at startup? It doesnt seem like a memory leak as we are not seeing it creeping upwards. So probably caused by initialization of all the internal structures. |
I did not dig into the profiling in my case, but just to confirm that @MarkusSintonen is not alone, I attempted a migration to Pydantic v2 on the application I am working on and tests durations exploded from 15 mins to 1h30. The slowest tests cases moved from around 15s to 20s and the fastest tests were around 1ms and increased to 700ms. I cannot be specific about the application but It is using FastAPI and Pydantic with many models including some generic ones (for Paginated API responses for instance). For now this is a blocker to move to Pydantic v2. Versions before (ok):
Versions after (nok):
EDITI found a way to circumvent the issue and accelerate my tests by removing the cold startup time of FastAPI by using a (almost) session scoped fixture for the FastAPI application. The only tests which require to modify the application configuration (Feature flags for instance) will have to go through the startup though. |
Yeah, @HenriBlacksmith I think the issue is that there is a lot more happening during app creation than used to, it makes sense to me that a using a scoped fixture for the fastapi app would make things MUCH faster for you. That said, we are still trying to find ways to speed things up by reducing the amount of work being done eagerly to prepare core schemas etc. |
We've actually fully resolved this now. On the benchmark from this issue, startup time has gone from about 11s on my laptop down to 2.5s, so back to v1 levels of performance. There were no compromises / observable changes made (aside from possibly some changes to the generated It can be even faster if you set the @MarkusSintonen I'd appreciate if you can test the next release (or |
Well, I still experience problems with startup time in a container. It takes more than 5 seconds for the app to return a simple request. The app is deployed in the Azure Container Apps. |
Was it faster than 5s on v1? Do you have anything we can run to test this performance? |
I did not test performance on v1. A bit more context: the ACA is by default scaled to 0 instance. When I pin the probe endpoint, it sometimes takes very long for the app to respond. This is also causing scaling issues because of the use case I have. One thing worth mentioning: I have created a app user to run the container. Not sure if this will degrade the performance. |
I'm not sure about your deployment topology but as per the benchmarks above you'd have to have some very complex setup with hundreds of generic models or something like that for it to take 5s. Or do you have a simple example we can reproduce with? |
Hi Adrian, just tested starting up the app locally and it takes approximately 3 seconds each time. So it's ok. I guess the issue is on Azure side. Thanks for your attention. I will contact Azure team for help. |
Hi @adriangb / @samuelcolvin! Sorry I haven't replied sooner. I was on a parental leave 👶 I tested 2.6.3 version of Pydantic against our code base with Checked the profiler again and it still looks pretty much the same. Huge amount of time is spent in core schema building. Like 90% of the test startup time goes there. Maybe FastAPI doesn't work at all with Some more info about the setup: We are using session fixture for our FastAPI instance so slowness is only at the test start up. After that tests run quickly but ofcourse its cumbersome now to write/use the API tests. |
Hi @MarkusSintonen, I'm sorry you're still having trouble. I see a few ways forward (not necessarily mutually exclusive):
|
Some more info: I verified if the cProfile in the original PR description actually reflects the actual time spent on different parts of core schema generation. It seems cProfile in this case causes a huge measurement error due to how highly recursive the core schema "walking" is. I instead manually instrumented the It actually seems its the
Then I cProfiled just the heaviest The profiler doesn't show any obvious spot taking most of the time. Its just doing so much overall so it starts to add up with lot of models. I hope this helps! (Above tests used |
Another idea: would it be somehow possible to "precompile" the core schemas? That would remove all the overhead of regenerating those every time on every startup as models don't really change that often (unless someone uses some kind of dynamic model, then precompiling wouldn't work). It would require exposing some kind of "exporting" of core schemas and then "importing" those into the Pydantic models. Then users could decide how exactly they get distributed and kept up-to-date. Maybe there could be some validation in "importing" to check the core schema matches the model its imported to (but again validating too much might make it slow again...). Looking at contents of core schemas there are the custom validator function callbacks and some other callbacks also there. Those would need to be handled at "import" time so the actual callbacks are added there to the core schema dicts, etc. |
That's really interesting, and saves me a lot of time. With that we might be able to look into the
Ye, we've thought about that - it's roughly what I meant (although I wasn't clear at all) by "cache core schemas, maybe even in a file". I think this could work - the hard is bit might be working out which of the pre-built schemas If the schemas are always pydantic models in the root of a module, I guess the key can be as simple as How are you schemas defined? |
Our Pydantic models are mainly defined in two places: Our internal Python package having "entity" data models (lots of these with varying complexity). Then we have our API models (FastAPI) in the service repo. These are generic, where the generic parameter is one of the previous models coming from our internal package. So in theory we could "precompile" the schemas in the both places. But I would say precompiling the APIs parametrized generic models would suffice. Then we would have precompiled core schemas for each variant of API models (so There might be again some difficulties with FastAPI as it does its own dynamic model magic... It would need to be checked how it actually fits then with FastAPI. If you at some point have a prototype I can give it a go! |
Is there a path forward to increase performance here? We still cannot upgrade to Pydantic v2 because the startup performance is an order of magnitude slower than v1. |
Initial Checks
Description
Our FastAPI application startup performance degraded from about 5s to over 20s after using V2. This makes eg our test runs very slow (haven't tried how long it takes in prod to get servers up and running).
Issue seems to be related towalk
ing of core schemas as belowcProfile
results show (from our app init).Issue seems to be coming from poor performance of initializing
TypeAdapter
s in FastAPI. The issue might be also related on how FastAPI uses newTypeAdapter
interface.But there might be some issue also inwalk
ing of core schemas, what do you think?Full cProfile print output: cprofile.txtSee latest profiles here: #6768 (comment)
Below are outdated profiles, see link above.
Example Code
Python, Pydantic & OS Version
Selected Assignee: @samuelcolvin
The text was updated successfully, but these errors were encountered: