Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve schema gen for nested dataclasses #9114

Merged
merged 11 commits into from Mar 27, 2024
Merged

Conversation

sydney-runkle
Copy link
Member

@sydney-runkle sydney-runkle commented Mar 26, 2024

When we have the schema available for a dataclass (or other type with __pydantic_core_schema) don't rebuild! Greatly improves schema generation time + greatly minimizes the number of recursive schema building calls.

import timeit
from typing import Literal, Annotated
from pydantic import Discriminator, TypeAdapter, BaseModel
from pydantic.dataclasses import dataclass


@dataclass(frozen=True, kw_only=True)
class Cat:
    type: Literal["cat"] = "cat"


@dataclass(frozen=True, kw_only=True)
class Dog:
    type: Literal["dog"] = "dog"


@dataclass(frozen=True, kw_only=True)
class NestedDataClass:
    animal: Annotated[Cat | Dog, Discriminator("type")]


class NestedModel(BaseModel):
    animal: Annotated[Cat | Dog, Discriminator("type")]


def construct_schema():
    @dataclass(frozen=True, kw_only=True)
    class Root:
        data_class: NestedDataClass
        model: NestedModel

        animal: Annotated[Cat | Dog, Discriminator("type")]

# not dividing by 1000 bc we want to view time in ms
print(f"Average schema build time (ish): {timeit.timeit(construct_schema, number=1000)} ms")
# Main: Average schema build time (ish): 1.632956000015838 ms
# This branch: Average schema build time (ish): 1.1692142079991754 ms

Selected Reviewer: @hramezani

@sydney-runkle sydney-runkle added relnotes-performance Used for performance improvements. relnotes-fix Used for bugfixes. labels Mar 26, 2024
Copy link

codspeed-hq bot commented Mar 26, 2024

CodSpeed Performance Report

Merging #9114 will improve performances by 53.81%

Comparing make_schema_gen_better (ee3ebfb) with main (548feec)

Summary

⚡ 1 improvements
✅ 12 untouched benchmarks

Benchmarks breakdown

Benchmark main make_schema_gen_better Change
test_construct_schema 36.8 ms 23.9 ms +53.81%

@sydney-runkle sydney-runkle removed the relnotes-fix Used for bugfixes. label Mar 26, 2024
@sydney-runkle
Copy link
Member Author

Please review

@adriangb
Copy link
Member

This is very nice!

Do we have any benchmarks? That'd be a nice to have so we can track this. Assuming you wrote some to come up with this change, can we commit them along with other benchmarks (if they don't already exist)?

There's several scenarios where I imagine we do need to rebuild:

# unsubstituted typevars
class Inner(BaseModel, Generic[T]):
    x: T

class Outer(BaseModel):
    inner: Inner[int]  # need to rebuild innner

# cyclical forward references
class InnerFwd(BaseModel):
    outer: OuterFwd

class OuterFwd(BaseModel):
    inner: InnerFwd

And probably others I'm not thinking of. Do we have tests showing that these use cases work correctly?

Copy link

cloudflare-pages bot commented Mar 27, 2024

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: ee3ebfb
Status: ✅  Deploy successful!
Preview URL: https://bdfaf89c.pydantic-docs2.pages.dev
Branch Preview URL: https://make-schema-gen-better.pydantic-docs2.pages.dev

View logs

@sydney-runkle
Copy link
Member Author

@adriangb,

There's several scenarios where I imagine we do need to rebuild

Good point - I've added tests for both of these cases both to test_main.py and test_dataclasses.py. Interestingly, the generic handling is different for dataclasses vs models (re concrete subclasses), but I think that's something we can address separately (and loop in @dmontagu).

Do we have any benchmarks? That'd be a nice to have so we can track this.

I've added a benchmark for the basic case with a nested model and dataclass. I think this should work, though I might have to add this in a separate PR so that we get a comparison in main before we merge this...

@sydney-runkle
Copy link
Member Author

Ah, added this to a separate PR: #9121

@adriangb adriangb enabled auto-merge (squash) March 27, 2024 17:28
@adriangb adriangb merged commit b30f44d into main Mar 27, 2024
54 checks passed
@adriangb adriangb deleted the make_schema_gen_better branch March 27, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review relnotes-performance Used for performance improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants