#8379: allow for alias enforcement during serialization by setting `serialize_by_alias` on `ConfigDict` #8997

alexanderankin · 2024-03-13T03:21:44Z

Change Summary

this allows for usage of field aliases as aliases - there is a discussion in #8379

Related issue number

#8379 (it may or may not be considered fully addressed after this PR)

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable - admittedly its minimal but at this stage ill propose that its sufficient
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @Kludex

codspeed-hq · 2024-03-13T03:24:58Z

CodSpeed Performance Report

Merging #8997 will not alter performance

_{Comparing alexanderankin:8379_improve_default_serialization_alias (e9c6939) with main (18d39fe)}

Summary

✅ 10 untouched benchmarks

alexanderankin · 2024-03-13T03:34:19Z

please review

sydney-runkle · 2024-03-17T15:07:32Z

@alexanderankin,

Thanks for this PR. I do think that we should offer a config setting for this behavior, but I have a few follow up questions / thoughts. Specifically, they relate to the fact that the populate_by_name config setting doesn't apply to nested models, but the runtime by_alias flag used for serialization does. See the following example for some context:

from pydantic import BaseModel, ConfigDict, Field, ValidationError

class Model(BaseModel):
    a: int = Field(..., alias='A')

class OuterModel(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

# we validate using aliases by default (that is, the populate_by_name config setting is False, by default)
# looking forward, we could consider this to be equivalent to validate_by_alias = True, by default
m = OuterModel(X={'A': 123}, Y=456)
print(repr(m))
#> OuterModel(x=Model(a=123), y=456)

# when we serialize, the by_alias flag is False by default, so the field names are used
print(m.model_dump())
#> {'x': {'a': 123}, 'y': 456}

# if we want to use the aliases when serializing, we can use the by_alias flag, which notably applies to nested models
# as seen by the fact that all of the keys are capitalized in the result
print(m.model_dump(by_alias=True))
#> {'X': {'A': 123}, 'Y': 456}

class OuterModel2(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

    model_config = ConfigDict(populate_by_name=True)

# here, we validate using the field names, but we still use the aliases for the nested model, 
# because the populate_by_name config setting doesn't apply to nested models
m = OuterModel2(x={'A': 123}, y=456)
print(repr(m))
#> OuterModel2(x=Model(a=123), y=456)

# if we don't, use field aliases for inner models we get a validation error, 
# as the populate_by_name config setting doesn't apply to nested models
try:
    m = OuterModel2(x={'a': 123}, y=456)
except ValidationError as e:
    print(e)
    """
    1 validation error for OuterModel2
    x.A
    Field required [type=missing, input_value={'a': 123}, input_type=dict]
        For further information visit https://errors.pydantic.dev/2.7/v/missing
    """

# with the new flag serialize_by_alias, we effectively patch the by_alias flag to be equal to the serialize_by_alias flag
# but this leads to inconsistent behavior in regards to nested models, compared to the populate_by_name (validation equivalent) flag

So, given this context, here are some thoughts:

Looking forward to V3, I think we want to shoot for a) consistently named config settings in regards to validation and serialization by alias. Specifically, I'd lean towards using validate_by_alias = True and serialize_by_alias = True. This would represent a change in the default behavior for serialization.

As a side note, we've discussed the possibilities of enabling changing default settings on a project by project basis. It's my current understanding that this comes with too many issues to be worth implementing, but a feasible workaround is to do something like what is mentioned here: #8830 (comment)

Unfortunately, the behavior associated with config settings vs runtime flags on serialization methods is different when it comes to nested models. I think users would find it quite confusing if the following code samples resulted in different behavior. Specifically, note the differences in the serialization of the nested model's a field.

from pydantic import BaseModel, ConfigDict, Field, ValidationError

class Model(BaseModel):
    a: int = Field(..., alias='A')

class OuterModel(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

    model_config = ConfigDict(serialize_by_alias=True)

m = OuterModel(X={'A': 123}, Y=456)
print(m.model_dump())
#> {'X': {'a': 123}, 'Y': 456}

class OuterModel2(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

m = OuterModel2(X={'A': 123}, Y=456)
print(m.model_dump(by_alias=True))
#> {'X': {'A': 123}, 'Y': 456}

My previous point begs the question - should we even keep the runtime flag in V3? I can see an argument for yes, as many users are currently relying on this serialization by alias logic. Then should we add a by_alias runtime flag for model_validate that applies to nested models to keep things more consistent? It seems like a headache to offer both config and runtime settings that behave differently, but in nuanced and confusing ways.

All of that being said, I do think that we need to move towards consistency for these alias focused settings. But I think we need to think more carefully about how we implement this config flag to avoid getting stuck maintaining another inconsistent API that users have issues with.

alexanderankin · 2024-03-17T15:19:01Z

I've added a commit documenting the nested behavior - which I agree could be better - but at least it is consistent with the recursive by_alias (thanks to the implementation) - i actually think a per model granularity would be better but - that requires getting into the rust codebase, and if ive read your comment correctly, the issue identified was inconsistency with by_alias? (still re-reading a couple of times on my end).

alexanderankin · 2024-03-17T18:05:34Z

for now my todo list is to either identify or create another per-model serialization setting

sydney-runkle · 2024-03-17T18:44:39Z

@alexanderankin,

I'll discuss with the team on Monday re the appropriate route forward. I think you've got a great start and are thinking about the right things!

sydney-runkle · 2024-03-17T18:50:21Z

A more consolidated summary:

Flags for validation / serialization by alias are inconsistent in terms of behavior with nested models

populate_by_name config setting doesn't trickle down to nested models
by_alias serialization parameter does trickle down to nested models

Default setting for validation vs serialization by alias is not consistent (they should both use aliases by default in V3)

populate_by_name is False by default (better phrased as validate_by_alias is True)
by_alias is False by default (better phrased as serialize_by_alias is False)

I would at least expect config settings to behave the same way in terms of application to nested models, and I would expect the same for parameters to validation / serialization method. However, the fact that these are different at the moment is a bit confusing, and may get more confusing if we choose to add a serialization config setting and/or a validation by_alias parameter.

alexanderankin · 2024-03-17T22:38:07Z

In the spirit of preservation of behavior, I would implement my change to be consistent with populate by name - fine grained is a feature

…

On Sun, Mar 17, 2024, 2:50 PM Sydney Runkle ***@***.***> wrote: A more consolidated summary: 1. Flags for validation / serialization by alias are inconsistent in terms of behavior with nested models - populate_by_name config setting doesn't trickle down to nested models - by_alias serialization parameter does trickle down to nested models 1. Default setting for validation vs serialization by alias is not consistent (they should both use aliases by default in V3) - populate_by_name is False by default (better phrased as validate_by_alias is True) - by_alias is False by default (better phrased as serialize_by_aliasis False`) I would at least expect config settings to behave the same way in terms of application to nested models, and I would expect the same for parameters to validation / serialization method. However, the fact that these are different at the moment is a bit confusing, and may get more confusing if we choose to add a serialization config setting and/or a validation by_alias parameter. — Reply to this email directly, view it on GitHub <#8997 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACECGJBZHBTD7AZLQMRWUT3YYXQYHAVCNFSM6AAAAABETLVZMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGU3TMNJZHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

samuelcolvin · 2024-03-18T17:20:35Z

I haven't looked at the implementation, but I have read the discussion.

We need either:

a config setting serialize_by_alias which only applies to that model - consistent with other bits of config, but perhaps not that useful
a new way to define behaviour where we can put stuff which either:
- doesn't apply the same way as config
- or you want to apply globally

We've use both "config" and "settings" elsewhere, so I would propose pydantic.global_defaults which is a defined something like:

class GlobalDefaults(TypedDict):
    serialize_by_default: bool
    # what else do we want here?

global_defaults = GlobalDefaults(serialize_by_default=True)

sydney-runkle · 2024-03-25T21:20:35Z

Closing this for now - the global defaults work should be in a different PR, and the current implementation of the serialize_by_alias flag doesn't apply only to the model in question.

sydney-runkle · 2024-03-25T21:21:13Z

I still think #8379 specifies an important change to be made, just not in the way suggested in this PR.

alexanderankin added 2 commits March 12, 2024 23:07

pydantic#8379: allow for alias enforcement during serialization

68ccaf3

found another code path while writing the test

a178c76

fix naming and couple more places

5a45b2f

pydantic-hooky bot added the ready for review label Mar 13, 2024

pydantic-hooky bot assigned Kludex Mar 13, 2024

alexanderankin changed the title ~~#8379: allow for alias enforcement during serialization by setting serialize_by_name on ConfigDict~~ #8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict Mar 13, 2024

Kludex assigned sydney-runkle and unassigned Kludex Mar 13, 2024

sydney-runkle added the relnotes-feature label Mar 17, 2024

document the nested behavior

e9c6939

samuelcolvin changed the title ~~#8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict~~ #8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict Mar 18, 2024

sydney-runkle closed this Mar 25, 2024

sydney-runkle mentioned this pull request Mar 26, 2024

(WIP) high priority / high impact work 🚀 #9102

Open

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#8379: allow for alias enforcement during serialization by setting `serialize_by_alias` on `ConfigDict` #8997

#8379: allow for alias enforcement during serialization by setting `serialize_by_alias` on `ConfigDict` #8997

alexanderankin commented Mar 13, 2024 •

edited

codspeed-hq bot commented Mar 13, 2024 •

edited

alexanderankin commented Mar 13, 2024

sydney-runkle commented Mar 17, 2024

alexanderankin commented Mar 17, 2024

alexanderankin commented Mar 17, 2024

sydney-runkle commented Mar 17, 2024

sydney-runkle commented Mar 17, 2024 •

edited by samuelcolvin

alexanderankin commented Mar 17, 2024 via email

samuelcolvin commented Mar 18, 2024

sydney-runkle commented Mar 25, 2024

sydney-runkle commented Mar 25, 2024

#8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict #8997

#8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict #8997

Conversation

alexanderankin commented Mar 13, 2024 • edited

Change Summary

Related issue number

Checklist

codspeed-hq bot commented Mar 13, 2024 • edited

CodSpeed Performance Report

Merging #8997 will not alter performance

Summary

alexanderankin commented Mar 13, 2024

sydney-runkle commented Mar 17, 2024

alexanderankin commented Mar 17, 2024

alexanderankin commented Mar 17, 2024

sydney-runkle commented Mar 17, 2024

sydney-runkle commented Mar 17, 2024 • edited by samuelcolvin

alexanderankin commented Mar 17, 2024 via email

samuelcolvin commented Mar 18, 2024

sydney-runkle commented Mar 25, 2024

sydney-runkle commented Mar 25, 2024

#8379: allow for alias enforcement during serialization by setting `serialize_by_alias` on `ConfigDict` #8997

#8379: allow for alias enforcement during serialization by setting `serialize_by_alias` on `ConfigDict` #8997

alexanderankin commented Mar 13, 2024 •

edited

codspeed-hq bot commented Mar 13, 2024 •

edited

sydney-runkle commented Mar 17, 2024 •

edited by samuelcolvin