Recursive model with discriminated union has exponential time and space complexity for validation #8709

palle-k · 2024-02-02T14:09:26Z

Initial Checks

I confirm that I'm using Pydantic V2

Description

When validating the recursive model given in the example, Pydantic has exponential time and space complexity, meaning that validation time and memory usage may grow indefinitely and rapidly depending on model depth, even when using a discriminator field.

Given the presence of a discriminator field, this should not be the case.

A model like this can potentially be abused in a denial-of-service attack, if the data passed to the validator can be controlled by the user.

Example Code

from __future__ import annotations
from typing import Literal, Annotated
from pydantic import Field, TypeAdapter, BaseModel

class NestedState(BaseModel):
    state_type: Literal["nested"]
    substate: AnyState

# If this type is left out, the model behaves normally again
class LoopState(BaseModel):
    state_type: Literal["loop"]
    substate: AnyState

class LeafState(BaseModel):
    state_type: Literal["leaf"]

AnyState = Annotated[NestedState | LoopState | LeafState, Field(..., discriminator="state_type")]

def build_nested_state(n):
    if n <= 0:
        return {"state_type": "leaf"}
    else:
        return {"state_type": "loop", "substate": {"state_type": "nested", "substate": build_nested_state(n-1)}}
        
adapter = TypeAdapter(AnyState)

# the next statement takes around 0.8s
adapter.validate_python(build_nested_state(9))

# the next statement takes around 3.5s
adapter.validate_python(build_nested_state(10))

# the next statement takes around 12.5s
adapter.validate_python(build_nested_state(11))

Python, Pydantic & OS Version

pydantic version: 2.6.0
        pydantic-core version: 2.16.1
          pydantic-core build: profile=release pgo=true
                 install path: /<redacted>/venv-3.11/lib/python3.11/site-packages/pydantic
               python version: 3.11.7 (main, Dec  4 2023, 18:10:11) [Clang 15.0.0 (clang-1500.1.0.2.5)]
                     platform: macOS-14.3-arm64-arm-64bit
             related packages: fastapi-0.105.0 typing_extensions-4.8.0 pydantic-settings-2.1.0
                       commit: unknown

The text was updated successfully, but these errors were encountered:

sydney-runkle · 2024-02-06T15:14:52Z

@palle-k,

Thanks for reporting this. @davidhewitt, could you please take a look at this performance issue when you have a chance? Thank you!

davidhewitt · 2024-02-08T12:28:08Z

It looks like there's a potential schema-building bug here.

The core schema looks like this:

{'definitions': [{'cls': <class '__main__.NestedState'>,
                  'config': {'title': 'NestedState'},
                  'custom_init': False,
                  'metadata': {'pydantic.internal.needs_apply_discriminated_union': False,
                               'pydantic_js_annotation_functions': [],
                               'pydantic_js_functions': [functools.partial(<function modify_model_json_schema at 0x7fba5440fce0>, cls=<class '__main__.NestedState'>),
                                                         <bound method BaseModel.__get_pydantic_json_schema__ of <class '__main__.NestedState'>>]},
                  'ref': '__main__.NestedState:94515929304032',
                  'root_model': False,
                  'schema': {'computed_fields': [],
                             'fields': {'state_type': {'metadata': {'pydantic_js_annotation_functions': [<function get_json_schema_update_func.<locals>.json_schema_update_func at 0x7fba54486fc0>],
                                                                    'pydantic_js_functions': []},
                                                       'schema': {'expected': ['nested'],
                                                                  'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                  'type': 'literal'},
                                                       'type': 'model-field'},
                                        'substate': {'metadata': {'pydantic_js_annotation_functions': [<function get_json_schema_update_func.<locals>.json_schema_update_func at 0x7fba54487100>],
                                                                  'pydantic_js_functions': []},
                                                     'schema': {'default': Ellipsis,
                                                                'schema': {'choices': [{'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                                        'schema_ref': '__main__.NestedState:94515929304032',
                                                                                        'type': 'definition-ref'},
                                                                                       {'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                                        'schema_ref': '__main__.LoopState:94515929369184',
                                                                                        'type': 'definition-ref'},
                                                                                       {'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                                        'schema_ref': '__main__.LeafState:94515929326848',
                                                                                        'type': 'definition-ref'}],
                                                                           'metadata': {'pydantic.internal.needs_apply_discriminated_union': True,
                                                                                        'pydantic.internal.union_discriminator': 'state_type'},
                                                                           'type': 'union'},
                                                                'type': 'default'},
                                                     'type': 'model-field'}},
                             'model_name': 'NestedState',
                             'type': 'model-fields'},
                  'type': 'model'},
                 {'cls': <class '__main__.LoopState'>,
                  'config': {'title': 'LoopState'},
                  'custom_init': False,
                  'metadata': {'pydantic.internal.needs_apply_discriminated_union': False,
                               'pydantic_js_annotation_functions': [],
                               'pydantic_js_functions': [functools.partial(<function modify_model_json_schema at 0x7fba5440fce0>, cls=<class '__main__.LoopState'>),
                                                         <bound method BaseModel.__get_pydantic_json_schema__ of <class '__main__.LoopState'>>]},
                  'ref': '__main__.LoopState:94515929369184',
                  'root_model': False,
                  'schema': {'computed_fields': [],
                             'fields': {'state_type': {'metadata': {'pydantic_js_annotation_functions': [<function get_json_schema_update_func.<locals>.json_schema_update_func at 0x7fba544871a0>],
                                                                    'pydantic_js_functions': []},
                                                       'schema': {'expected': ['loop'],
                                                                  'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                  'type': 'literal'},
                                                       'type': 'model-field'},
                                        'substate': {'metadata': {'pydantic_js_annotation_functions': [<function get_json_schema_update_func.<locals>.json_schema_update_func at 0x7fba54487380>],
                                                                  'pydantic_js_functions': []},
                                                     'schema': {'default': Ellipsis,
                                                                'schema': {'choices': [{'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                                        'schema_ref': '__main__.NestedState:94515929304032',
                                                                                        'type': 'definition-ref'},
                                                                                       {'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                                        'schema_ref': '__main__.LoopState:94515929369184',
                                                                                        'type': 'definition-ref'},
                                                                                       {'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                                        'schema_ref': '__main__.LeafState:94515929326848',
                                                                                        'type': 'definition-ref'}],
                                                                           'metadata': {'pydantic.internal.needs_apply_discriminated_union': True,
                                                                                        'pydantic.internal.union_discriminator': 'state_type'},
                                                                           'type': 'union'},
                                                                'type': 'default'},
                                                     'type': 'model-field'}},
                             'model_name': 'LoopState',
                             'type': 'model-fields'},
                  'type': 'model'},
                 {'cls': <class '__main__.LeafState'>,
                  'config': {'title': 'LeafState'},
                  'custom_init': False,
                  'metadata': {'pydantic.internal.needs_apply_discriminated_union': False,
                               'pydantic_js_annotation_functions': [],
                               'pydantic_js_functions': [functools.partial(<function modify_model_json_schema at 0x7fba5440fce0>, cls=<class '__main__.LeafState'>),
                                                         <bound method BaseModel.__get_pydantic_json_schema__ of <class '__main__.LeafState'>>]},
                  'ref': '__main__.LeafState:94515929326848',
                  'root_model': False,
                  'schema': {'computed_fields': [],
                             'fields': {'state_type': {'metadata': {'pydantic_js_annotation_functions': [<function get_json_schema_update_func.<locals>.json_schema_update_func at 0x7fba54484fe0>],
                                                                    'pydantic_js_functions': []},
                                                       'schema': {'expected': ['leaf'],
                                                                  'metadata': {'pydantic.internal.needs_apply_discriminated_union': False},
                                                                  'type': 'literal'},
                                                       'type': 'model-field'}},
                             'model_name': 'LeafState',
                             'type': 'model-fields'},
                  'type': 'model'}],
 'schema': {'choices': {'leaf': {'schema_ref': '__main__.LeafState:94515929326848',
                                 'type': 'definition-ref'},
                        'loop': {'schema_ref': '__main__.LoopState:94515929369184',
                                 'type': 'definition-ref'},
                        'nested': {'schema_ref': '__main__.NestedState:94515929304032',
                                   'type': 'definition-ref'}},
            'discriminator': 'state_type',
            'from_attributes': True,
            'metadata': {'pydantic.internal.needs_apply_discriminated_union': True,
                         'pydantic.internal.union_discriminator': 'state_type'},
            'strict': False,
            'type': 'tagged-union'},
 'type': 'definitions'}

In particular the interesting bit to me is that we have 3 definitions instead of 4; the AnyState union is inlined in each model. But in the inlined model copies it is 'type': 'union' instead of 'type': 'tagged-union', which would explain where there's linear search rather than tagged dispatch at each level.

Hence the exponential blowup.

cc @adriangb as the schema building expert ;)

adriangb · 2024-02-08T12:59:39Z

The schema simplifies to:

{
    "type": "definitions",
    "schema": {
        "type": "tagged-union",
        "choices": {
            "nested": {
                "type": "definition-ref",
                "schema_ref": "__main__.NestedState:4587657680"
            },
            "loop": {
                "type": "definition-ref",
                "schema_ref": "__main__.LoopState:4587670848"
            },
            "leaf": {
                "type": "definition-ref",
                "schema_ref": "__main__.LeafState:5124597648"
            }
        },
        "discriminator": "state_type"
    },
    "definitions": [
        {
            "type": "model",
            "cls": "__main__.LeafState",
            "schema": {
                "type": "model-fields",
                "fields": {
                    "state_type": {
                        "type": "model-field",
                        "schema": {
                            "type": "literal",
                            "expected": [
                                "leaf"
                            ]
                        }
                    }
                },
                "model_name": "LeafState"
            },
            "ref": "__main__.LeafState:5124597648"
        },
        {
            "type": "model",
            "cls": "__main__.LoopState",
            "schema": {
                "type": "model-fields",
                "fields": {
                    "state_type": {
                        "type": "model-field",
                        "schema": {
                            "type": "literal",
                            "expected": [
                                "loop"
                            ]
                        }
                    },
                    "substate": {
                        "type": "model-field",
                        "schema": {
                            "type": "default",
                            "schema": {
                                "type": "union",
                                "choices": [
                                    {
                                        "type": "definition-ref",
                                        "schema_ref": "__main__.NestedState:4587657680"
                                    },
                                    {
                                        "type": "definition-ref",
                                        "schema_ref": "__main__.LoopState:4587670848"
                                    },
                                    {
                                        "type": "definition-ref",
                                        "schema_ref": "__main__.LeafState:5124597648"
                                    }
                                ]
                            }
                        }
                    }
                }
            },
            "ref": "__main__.LoopState:4587670848"
        },
        {
            "type": "model",
            "cls": "__main__.NestedState",
            "schema": {
                "type": "model-fields",
                "fields": {
                    "state_type": {
                        "type": "model-field",
                        "schema": {
                            "type": "literal",
                            "expected": [
                                "nested"
                            ]
                        }
                    },
                    "substate": {
                        "type": "model-field",
                        "schema": {
                            "type": "default",
                            "schema": {
                                "type": "union",
                                "choices": [
                                    {
                                        "type": "definition-ref",
                                        "schema_ref": "__main__.NestedState:4587657680"
                                    },
                                    {
                                        "type": "definition-ref",
                                        "schema_ref": "__main__.LoopState:4587670848"
                                    },
                                    {
                                        "type": "definition-ref",
                                        "schema_ref": "__main__.LeafState:5124597648"
                                    }
                                ]
                            }
                        }
                    }
                }
            },
            "ref": "__main__.NestedState:4587657680"
        }
    ]
}

The issue is that the tagged union is not carried along so it's loss after the first union and thus performance degrades.
So yes it's a schema generation bug.

@sydney-runkle is that enough to go off of? I'm guessing we're not doing the apply_discrimnators stuff correctly.

sydney-runkle · 2024-02-08T14:37:01Z

Hmm, this looks similar to this issue: #8688, which we fixed here #8702 for the 2.6.1 release. @palle-k, could you please confirm whether or not this issue has been resolved with 2.6.1?

davidhewitt · 2024-02-08T15:23:47Z

Afraid I'm reproducing this on main 😢

palle-k · 2024-02-08T16:15:45Z

@sydney-runkle unfortunately still reproducible with 2.6.1, as David already noticed.

sydney-runkle · 2024-02-08T16:18:08Z

@davidhewitt, ah rats, ok 🐀.

rossmacarthur · 2024-02-13T16:21:00Z

+1 this seems pretty critical 😬, is there any workaround?

rossmacarthur · 2024-02-13T16:25:13Z

This is a regression because I don't see this issue with

pydantic==2.4.2
pydantic_core==2.10.1

Unfortunately I can't downgrade because my dependencies have updated to >2.5.0

sydney-runkle · 2024-02-13T17:19:03Z

@rossmacarthur,

Thanks for the +1. I've added to the list of issues we'll attempt to fix for 2.6.2. Hopefully we can get that release out next week. I'll keep you updated as we progress. I don't currently know of a workaround for this performance issue.

davidhewitt · 2024-02-13T17:52:11Z

Here's a somewhat clumsy workaround:

from __future__ import annotations
from typing import Any, Literal, Annotated
from pydantic import AfterValidator, Field, TypeAdapter, BaseModel
import time

def validate_any_state(value: Any):
    return _ADAPTER.validate_python(value)

AnyState = Annotated[Any, AfterValidator(validate_any_state)]

class NestedState(BaseModel):
    state_type: Literal["nested"]
    substate: AnyState

# If this type is left out, the model behaves normally again
class LoopState(BaseModel):
    state_type: Literal["loop"]
    substate: AnyState

class LeafState(BaseModel):
    state_type: Literal["leaf"]

_ADAPTER = TypeAdapter(Annotated[NestedState | LoopState | LeafState, Field(..., discriminator="state_type")])

def build_nested_state(n):
    if n <= 0:
        return {"state_type": "leaf"}
    else:
        return {"state_type": "loop", "substate": {"state_type": "nested", "substate": build_nested_state(n-1)}}
        
for i in range(13):
    start = time.time()
    _ADAPTER.validate_python(build_nested_state(i))
    print(f"Time for {i}: {time.time() - start}")

It works by cutting the recursive model apart a bit crudely with the AfterValidator. This has downsides; it's opaque to pydantic-core so will affect the way that things like model_config are propagated and usually an AfterValidator is not as efficient as builtin design, if we can support it. In this case while we've got the performance bug the AfterValidator performs significantly better 😂

Disclaimer: I haven't checked if doing this creates other unexpected effects. YMMV.

rossmacarthur · 2024-02-14T06:39:07Z

I git bisected pydantic and pydantic-core and the bug was introduced in these commits

pydantic: bump pydantic-core to 2.14.0: fb8a594
pydantic-core: replace ultra_strict with new union implementation: pydantic/pydantic-core@f409e00

davidhewitt · 2024-02-14T09:38:33Z

Thanks for checking that 👍

Given that those commits changed the way a non-tagged union is evaluated, that's not a huge surprise. I suspect that the truth is the schema building bug illustrated above existed before those commits, but the newer smart union implementation (which tries to be more accurate in which type gets produced) exacerbated the existing bug.

sydney-runkle · 2024-02-15T14:14:33Z

Ah I see. I don't think we necessarily need to include this fix in 2.6.2 then, though the sooner we get a fix the better. I'll add this to the 2.7 milestone.

rossmacarthur · 2024-02-15T14:33:54Z

If anyone else needs a workaround until this fixed I'm currently using a Discriminator with a custom function and annotating each type with Tag. The built schema always has a tagged-union in this case:

def get_state_type(v: Any) -> Hashable:
    if isinstance(v, dict):
        return v.get("state_type")
    return getattr(v, "state_type", None)

AnyState = Annotated[
    Annotated[NestedState, Tag("nested")]
    | Annotated[LoopState, Tag("loop")]
    | Annotated[LeafState, Tag("leaf")],
    Discriminator(get_state_type),
]

palle-k · 2024-02-15T14:54:26Z

@sydney-runkle by the way, as I've mentioned previously, this issue can lead to a denial of service attack (and I did find apps which are vulnerable to this issue). Should we publish a security advisory somewhere?

tarapt · 2024-02-25T11:46:44Z

@palle-k https://github.com/pydantic/pydantic/security/policy

shivenmian · 2024-02-25T12:00:18Z

Facing this issue as well at both the 01 and open-interpreter projects on my M2 Macbook Pro. Pydantic version 2.6.2, pydantic-core version 2.16.3.

Things were working fine for me yesterday, but since today morning, my FastAPI server does not start due to this Pydantic schema issue. This is also coming up in my OpenAI dependency, for which I've opened an issue here.

I did a memray memory profile to investigate further. Here is the report from open-interpreter, and here is the report from the 01 project. In both, the inflated memory seems to be due to this issue, since both FastAPI and OpenAI use Pydantic.

Could you give a timeline for this issue to be fixed? If this is not expected soon, could you give a patch I can apply in the meantime?

cc @KillianLucas

sydney-runkle · 2024-02-27T15:07:50Z

Hi all,

Given the significance of this bug, we're working on a fix now, and will plan to do a 2.6.3 release with said fix, once we have it! We'll keep this thread updated as we progress.

sydney-runkle · 2024-02-27T18:31:43Z

Hey all 😄,

After some extended debugging, we found a fix, and will release 2.6.3 today with the changes. Please ping me if you're experiencing any other issues. Happy coding! 🚀

palle-k added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Feb 2, 2024

palle-k changed the title ~~Recursive model with discriminated union requires exponential time and space complexity for validation~~ Recursive model with discriminated union has exponential time and space complexity for validation Feb 5, 2024

sydney-runkle assigned davidhewitt Feb 6, 2024

sydney-runkle mentioned this issue Feb 13, 2024

Known 2.6 Issues - WIP Fixes for v2.6.2 #8741

Closed

2 tasks

sydney-runkle added this to the v2.7.0 milestone Feb 15, 2024

shivenmian mentioned this issue Feb 25, 2024

memory leak, consumes entire system after just days of usage openai/openai-python#1181

Closed

1 task

sydney-runkle mentioned this issue Feb 27, 2024

Fix discriminated union schema gen bug #8904

Merged

sydney-runkle closed this as completed in #8904 Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive model with discriminated union has exponential time and space complexity for validation #8709

Recursive model with discriminated union has exponential time and space complexity for validation #8709

palle-k commented Feb 2, 2024 •

edited

sydney-runkle commented Feb 6, 2024

davidhewitt commented Feb 8, 2024

adriangb commented Feb 8, 2024

sydney-runkle commented Feb 8, 2024

davidhewitt commented Feb 8, 2024

palle-k commented Feb 8, 2024

sydney-runkle commented Feb 8, 2024

rossmacarthur commented Feb 13, 2024

rossmacarthur commented Feb 13, 2024 •

edited

sydney-runkle commented Feb 13, 2024

davidhewitt commented Feb 13, 2024

rossmacarthur commented Feb 14, 2024

davidhewitt commented Feb 14, 2024

sydney-runkle commented Feb 15, 2024

rossmacarthur commented Feb 15, 2024

palle-k commented Feb 15, 2024

tarapt commented Feb 25, 2024

shivenmian commented Feb 25, 2024 •

edited

sydney-runkle commented Feb 27, 2024

sydney-runkle commented Feb 27, 2024

Recursive model with discriminated union has exponential time and space complexity for validation #8709

Recursive model with discriminated union has exponential time and space complexity for validation #8709

Comments

palle-k commented Feb 2, 2024 • edited

Initial Checks

Description

Example Code

Python, Pydantic & OS Version

sydney-runkle commented Feb 6, 2024

davidhewitt commented Feb 8, 2024

adriangb commented Feb 8, 2024

sydney-runkle commented Feb 8, 2024

davidhewitt commented Feb 8, 2024

palle-k commented Feb 8, 2024

sydney-runkle commented Feb 8, 2024

rossmacarthur commented Feb 13, 2024

rossmacarthur commented Feb 13, 2024 • edited

sydney-runkle commented Feb 13, 2024

davidhewitt commented Feb 13, 2024

rossmacarthur commented Feb 14, 2024

davidhewitt commented Feb 14, 2024

sydney-runkle commented Feb 15, 2024

rossmacarthur commented Feb 15, 2024

palle-k commented Feb 15, 2024

tarapt commented Feb 25, 2024

shivenmian commented Feb 25, 2024 • edited

sydney-runkle commented Feb 27, 2024

sydney-runkle commented Feb 27, 2024

palle-k commented Feb 2, 2024 •

edited

rossmacarthur commented Feb 13, 2024 •

edited

shivenmian commented Feb 25, 2024 •

edited