Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow evaluation of recursive types #7149

Closed
1 task done
mciucu opened this issue Aug 16, 2023 · 5 comments
Closed
1 task done

Very slow evaluation of recursive types #7149

mciucu opened this issue Aug 16, 2023 · 5 comments
Assignees
Labels
bug V2 Bug related to Pydantic V2

Comments

@mciucu
Copy link
Contributor

mciucu commented Aug 16, 2023

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

I've tracked it down to the inner function count_refs in _core_utils.py, that seems to be doing at least quadratic work than what would be required. With ~10 classes that could be recursively included inside one another in a tree I'm getting ~1 minute execution on calling model_rebuild() for all.
The change that I think would be safe and efficient would be change lines 464-467 with:

        if ref_counts[ref] >= 2:
            # If this model is involved in a recursion this should be detected
            # on its second encounter, otherwise we can stop recursing.
            if current_recursion_ref_count[ref] != 0:
                involved_in_recursion[ref] = True
            return s

This should be safe, given that if a model X can contain itself the second time when entering the count_refs function should be directly or indirectly from an inclusion where model X is already in the call stack.
Call time for model_rebuild() is <1s now.
I can make a pull request if that's ok.

Example Code

from __future__ import annotations

import time
from typing import Union, Optional, Literal

from pydantic import BaseModel


class BaseC(BaseModel):
    subexpr: CAny


class C0(BaseC):
    type: Literal["C0"]


class C1(BaseC):
    type: Literal["C1"]


class C2(BaseC):
    type: Literal["C2"]


class C3(BaseC):
    type: Literal["C3"]


class C4(BaseC):
    type: Literal["C4"]


class C5(BaseC):
    type: Literal["C5"]


class C6(BaseC):
    type: Literal["C6"]


class C7(BaseC):
    type: Literal["C7"]
    r: Optional[CExpr]


class C8(BaseC):
    type: Literal["C8"]
    r: Optional[CExpr]


class C9(BaseC):
    type: Literal["C9"]
    r: Optional[CExpr]


class C10(BaseC):
    type: Literal["C10"]
    r: Optional[CExpr]


class CExpr(BaseModel):
    type: Literal["or", "and"]
    subexpr: list[CAny]


CAny = Union[C0, C1, C2, C3, C4, C5, C6, C7, C8, C9, C10]

start_time = time.time()
C0.model_rebuild()
print("Duration:", time.time() - start_time)

Python, Pydantic & OS Version

pydantic version: 2.1.1
        pydantic-core version: 2.4.0
          pydantic-core build: profile=release pgo=true mimalloc=true
                 install path: /blink/api/.venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.4 (main, Jun  9 2023, 07:59:55) [GCC 12.3.0]
                     platform: Linux-6.2.0-27-generic-x86_64-with-glibc2.37
     optional deps. installed: ['typing-extensions']

Selected Assignee: @davidhewitt

@mciucu mciucu added bug V2 Bug related to Pydantic V2 unconfirmed Bug not yet confirmed as valid/applicable labels Aug 16, 2023
@dmontagu
Copy link
Contributor

dmontagu commented Aug 16, 2023

Great work investigating the issue!

I can confirm that with the proposed change all tests pass, but @adriangb this is your code and I figure you might be quicker to understand the implications?

@mciucu I think it's a good idea to open a PR for this either way, please feel free, very much appreciated.

@dmontagu dmontagu removed the unconfirmed Bug not yet confirmed as valid/applicable label Aug 16, 2023
@adriangb
Copy link
Member

This generally looks good. I added some pretty in depth tests precisely with the idea that I'd TDD that code at the time and someone would come along and improve it later :). Let's take a careful look at the tests and make sure they cover all of the cases, in particular around those LOC, and add more if needed. If that works I'm very positive for this change.

@mciucu
Copy link
Contributor Author

mciucu commented Aug 17, 2023

I've created a pull request with the change.

I think maybe a test based on the structure I had in that example code would help catch any performance regressions in the future, since that snipped takes about 5 minutes to run for me locally (so it seemed like it was O(N!)). I can add here anything if it would be helpful, was just not sure how benchmark tests are implemented in pydantic.

@michaelgmiller1
Copy link

This is preventing us from migrating to v2, as our models take more than 10 minutes (!!!!) to rebuild. Would be great to identify a fix, and cut a release so we can bump versions and enjoy the performance of Rust! :)

@adriangb
Copy link
Member

The pull request was merged, this will come out in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2
Projects
None yet
Development

No branches or pull requests

5 participants