Allocations overhead ? #843

mselee · 2023-07-30T11:57:57Z

First of all, thanks for the fantastic library.

While benchmarking pydantic v2 on some complex models with a lot of fields/nesting (unfortunately I can't share the models) I noticed abnormal performance due to allocations/deallocations:

@model_validator(mode='wrap') is used in these models but is dwarfed by the Clone and Drop calls. The (de-)allocations represent ~75% of the total run time.

I tried replacing the Box<CombinedValidator> with an Arc<RwLock<CombinedValidator>> for:

FunctionWrapValidator.validator
FunctionBeforeValidator.validator
FunctionAfterValidator.validator

After rebuilding with the patch, the (de-)allocations calls dropped:

Benchmark (1000 iterations)

Container	Time	Allocations	Memory (Peak)
`Box<CombinedValidator>`	~13 sec	202000	66.4 KiB
`Arc<RwLock<CombinedValidator>>`	~3 sec	2000	1.6 KiB

But I'm not sure if this is actually sound (ref):

I would suggest using Arc where you are using Box, but unfortunately Validator::set_ref takes &mut self. This prevents calling it on Arc. While we could use Arc<RwLock> and acquire locks as needed, it would still likely be incorrect since currently all cloned validators are entirely independent, while with the Arc approach modifying the validator will also modifies everything which references it.

There's no set_ref in the code base anymore, but I guess the same concern is still valid for validator.complete() ?

Selected Assignee: @samuelcolvin

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2023-07-30T12:04:40Z

Thanks so much for creating the issue, really interesting.

We were discussing Box just the other day and assuming it's overhead was tiny.

@davidhewitt what do you think?

samuelcolvin · 2023-07-30T12:17:06Z

What was your benchmark of? Validation or validator construction?

mselee · 2023-07-30T12:29:25Z

It is something similar to this

class ComplexPydanticModel(BaseModel):
    # I also tried `defer_build=True` or removing some of `validate_xxx` but didn't make a noticeable difference
    model_config = ConfigDict(from_attributes=True, validate_default=True, validate_return=True)

    # fields/validators are omitted

class Response(BaseModel, Generic[T]):
    data: T


BenchmarkModel = Response[List[ComplexPydanticModel]]
data = {"data": data}

for _ in range(1000):
    BenchmarkModel.model_validate(data)

data is a list of complex ORM objects and from my testing the list size doesn't matter. What matters is the iterations in the loop i.e. for 100 iterations the runtime is barely affected, but it starts to be noticeable if you increase it:

Iterations	Time
1000	~13 sec
500	~6 sec
100	~3 sec

davidhewitt · 2023-08-10T12:45:26Z

Thanks for the report and the proposed patch. Inspired by it I'm going a bit further and reducing away many cases where we clone validators (using probably just Rc<CombinedValidator> as we don't need the mutability any more).

I've got a patch which is almost there but I'm aiming to merge #867 first to make the patch simpler.

pydantic-hooky bot assigned samuelcolvin Jul 30, 2023

pydantic-hooky bot added the unconfirmed label Jul 30, 2023

anand-bala mentioned this issue Sep 21, 2023

OOM when loading large JSON files in v2 #985

Open

davidhewitt mentioned this issue Sep 26, 2023

Replace definitions Vec with OnceLock slots #992

Merged

4 tasks

davidhewitt linked a pull request Sep 26, 2023 that will close this issue

Replace definitions Vec with OnceLock slots #992

Merged

4 tasks

davidhewitt closed this as completed in #992 Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocations overhead ? #843

Allocations overhead ? #843

mselee commented Jul 30, 2023 •

edited

samuelcolvin commented Jul 30, 2023

samuelcolvin commented Jul 30, 2023

mselee commented Jul 30, 2023 •

edited

davidhewitt commented Aug 10, 2023

Allocations overhead ? #843

Allocations overhead ? #843

Comments

mselee commented Jul 30, 2023 • edited

Benchmark (1000 iterations)

samuelcolvin commented Jul 30, 2023

samuelcolvin commented Jul 30, 2023

mselee commented Jul 30, 2023 • edited

davidhewitt commented Aug 10, 2023

mselee commented Jul 30, 2023 •

edited

mselee commented Jul 30, 2023 •

edited