Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict #8997

Conversation

alexanderankin
Copy link

@alexanderankin alexanderankin commented Mar 13, 2024

Change Summary

this allows for usage of field aliases as aliases - there is a discussion in #8379

Related issue number

#8379 (it may or may not be considered fully addressed after this PR)

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable - admittedly its minimal but at this stage ill propose that its sufficient
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @Kludex

Copy link

codspeed-hq bot commented Mar 13, 2024

CodSpeed Performance Report

Merging #8997 will not alter performance

Comparing alexanderankin:8379_improve_default_serialization_alias (e9c6939) with main (18d39fe)

Summary

✅ 10 untouched benchmarks

@alexanderankin
Copy link
Author

please review

@alexanderankin alexanderankin changed the title #8379: allow for alias enforcement during serialization by setting serialize_by_name on ConfigDict #8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict Mar 13, 2024
@Kludex Kludex assigned sydney-runkle and unassigned Kludex Mar 13, 2024
@sydney-runkle
Copy link
Member

@alexanderankin,

Thanks for this PR. I do think that we should offer a config setting for this behavior, but I have a few follow up questions / thoughts. Specifically, they relate to the fact that the populate_by_name config setting doesn't apply to nested models, but the runtime by_alias flag used for serialization does. See the following example for some context:

from pydantic import BaseModel, ConfigDict, Field, ValidationError

class Model(BaseModel):
    a: int = Field(..., alias='A')

class OuterModel(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

# we validate using aliases by default (that is, the populate_by_name config setting is False, by default)
# looking forward, we could consider this to be equivalent to validate_by_alias = True, by default
m = OuterModel(X={'A': 123}, Y=456)
print(repr(m))
#> OuterModel(x=Model(a=123), y=456)

# when we serialize, the by_alias flag is False by default, so the field names are used
print(m.model_dump())
#> {'x': {'a': 123}, 'y': 456}

# if we want to use the aliases when serializing, we can use the by_alias flag, which notably applies to nested models
# as seen by the fact that all of the keys are capitalized in the result
print(m.model_dump(by_alias=True))
#> {'X': {'A': 123}, 'Y': 456}

class OuterModel2(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

    model_config = ConfigDict(populate_by_name=True)

# here, we validate using the field names, but we still use the aliases for the nested model, 
# because the populate_by_name config setting doesn't apply to nested models
m = OuterModel2(x={'A': 123}, y=456)
print(repr(m))
#> OuterModel2(x=Model(a=123), y=456)

# if we don't, use field aliases for inner models we get a validation error, 
# as the populate_by_name config setting doesn't apply to nested models
try:
    m = OuterModel2(x={'a': 123}, y=456)
except ValidationError as e:
    print(e)
    """
    1 validation error for OuterModel2
    x.A
    Field required [type=missing, input_value={'a': 123}, input_type=dict]
        For further information visit https://errors.pydantic.dev/2.7/v/missing
    """

# with the new flag serialize_by_alias, we effectively patch the by_alias flag to be equal to the serialize_by_alias flag
# but this leads to inconsistent behavior in regards to nested models, compared to the populate_by_name (validation equivalent) flag

So, given this context, here are some thoughts:

  1. Looking forward to V3, I think we want to shoot for a) consistently named config settings in regards to validation and serialization by alias. Specifically, I'd lean towards using validate_by_alias = True and serialize_by_alias = True. This would represent a change in the default behavior for serialization.

As a side note, we've discussed the possibilities of enabling changing default settings on a project by project basis. It's my current understanding that this comes with too many issues to be worth implementing, but a feasible workaround is to do something like what is mentioned here: #8830 (comment)

  1. Unfortunately, the behavior associated with config settings vs runtime flags on serialization methods is different when it comes to nested models. I think users would find it quite confusing if the following code samples resulted in different behavior. Specifically, note the differences in the serialization of the nested model's a field.
from pydantic import BaseModel, ConfigDict, Field, ValidationError

class Model(BaseModel):
    a: int = Field(..., alias='A')

class OuterModel(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

    model_config = ConfigDict(serialize_by_alias=True)

m = OuterModel(X={'A': 123}, Y=456)
print(m.model_dump())
#> {'X': {'a': 123}, 'Y': 456}

class OuterModel2(BaseModel):
    x: Model = Field(..., alias='X')
    y: int = Field(..., alias='Y')

m = OuterModel2(X={'A': 123}, Y=456)
print(m.model_dump(by_alias=True))
#> {'X': {'A': 123}, 'Y': 456}
  1. My previous point begs the question - should we even keep the runtime flag in V3? I can see an argument for yes, as many users are currently relying on this serialization by alias logic. Then should we add a by_alias runtime flag for model_validate that applies to nested models to keep things more consistent? It seems like a headache to offer both config and runtime settings that behave differently, but in nuanced and confusing ways.

All of that being said, I do think that we need to move towards consistency for these alias focused settings. But I think we need to think more carefully about how we implement this config flag to avoid getting stuck maintaining another inconsistent API that users have issues with.

@alexanderankin
Copy link
Author

I've added a commit documenting the nested behavior - which I agree could be better - but at least it is consistent with the recursive by_alias (thanks to the implementation) - i actually think a per model granularity would be better but - that requires getting into the rust codebase, and if ive read your comment correctly, the issue identified was inconsistency with by_alias? (still re-reading a couple of times on my end).

@alexanderankin
Copy link
Author

for now my todo list is to either identify or create another per-model serialization setting

@sydney-runkle
Copy link
Member

@alexanderankin,

I'll discuss with the team on Monday re the appropriate route forward. I think you've got a great start and are thinking about the right things!

@sydney-runkle
Copy link
Member

sydney-runkle commented Mar 17, 2024

A more consolidated summary:

  1. Flags for validation / serialization by alias are inconsistent in terms of behavior with nested models
  • populate_by_name config setting doesn't trickle down to nested models
  • by_alias serialization parameter does trickle down to nested models
  1. Default setting for validation vs serialization by alias is not consistent (they should both use aliases by default in V3)
  • populate_by_name is False by default (better phrased as validate_by_alias is True)
  • by_alias is False by default (better phrased as serialize_by_alias is False)

I would at least expect config settings to behave the same way in terms of application to nested models, and I would expect the same for parameters to validation / serialization method. However, the fact that these are different at the moment is a bit confusing, and may get more confusing if we choose to add a serialization config setting and/or a validation by_alias parameter.

@alexanderankin
Copy link
Author

alexanderankin commented Mar 17, 2024 via email

@samuelcolvin samuelcolvin changed the title #8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict #8379: allow for alias enforcement during serialization by setting serialize_by_alias on ConfigDict Mar 18, 2024
@samuelcolvin
Copy link
Member

I haven't looked at the implementation, but I have read the discussion.

We need either:

  • a config setting serialize_by_alias which only applies to that model - consistent with other bits of config, but perhaps not that useful
  • a new way to define behaviour where we can put stuff which either:
    • doesn't apply the same way as config
    • or you want to apply globally

We've use both "config" and "settings" elsewhere, so I would propose pydantic.global_defaults which is a defined something like:

class GlobalDefaults(TypedDict):
    serialize_by_default: bool
    # what else do we want here?

global_defaults = GlobalDefaults(serialize_by_default=True)

@sydney-runkle
Copy link
Member

Closing this for now - the global defaults work should be in a different PR, and the current implementation of the serialize_by_alias flag doesn't apply only to the model in question.

@sydney-runkle
Copy link
Member

I still think #8379 specifies an important change to be made, just not in the way suggested in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants