You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In any format which encodes an enum as its index rather than it's name, if an enum is used within data for an untagged enum, the deserialization fails.
UntaggedEnum::A(InnerEnum::B) will succeed a round-trip in formats which encode enum variants as string names, but it will fail in formats which use the variant index.
The generated __FieldVisitor accepts u64 for the variant index, str for the variant name, or bytes for the variant name. But when deserializing, it requests an identifier.
The second part is that ContentRefDeserializer and ContentDeserializer only pass through string/byte values when deserialize_identifier is requested. This method rejects integers for deserialize_identifier.
When these two facts are combined, it makes it impossible for InnerEnum inside UntaggedEnum to be deserialized in any format which uses variant index rather than name.
Background for Context:
When a #[serde(untagged)] enum deserializes data, it first deserializes it into a Content, then passes ContentRefDeserializer and/or ContentDeserialize to each variant in turn, exiting as soon as one succeeds.
I see two ways to fix this - we could either change #[derive(Deserialize)] on enums to use deserialize_any instead of deserialize_identifier for the variant identifier, or we could change ContentRefDeserializer and ContentDeserializer to accept integers inside deserialize_identifier.
I think the second would be strictly less of a breaking change, since it only means things that previously caused an error no longer do. Changing deserialize_identifier to deserialize_any in #[derive(Deserialize)] would remove some information provided to non-self-describing formats.
I'm not an expert on how serde tries to be compatible though. If a maintainer has an opinion on which one would be best, I'd be willing to submit a PR to fix this - though it seems like it'll be a small fix either way.
They are affected by this because the default serialization serializes enum variants as integers, rather than strings. If I've correctly identified the issue, it should apply to all formats which do this, though.
If there's any info I can provide, feel free to ask. Cheers.
The text was updated successfully, but these errors were encountered:
In any format which encodes an enum as its index rather than it's name, if an enum is used within data for an untagged enum, the deserialization fails.
Example of what I mean:
UntaggedEnum::A(InnerEnum::B)
will succeed a round-trip in formats which encode enum variants as string names, but it will fail in formats which use the variant index.This was introduced in e1db820#diff-751ea35094a96d89c2d04ff6a61e703aR2139 and serde v1.39.0.
The actual problem is two-fold.
First, when
InnerEnum
deserializes it's enum variant, it usesdeserializer.deserialize_identifier
:The generated
__FieldVisitor
acceptsu64
for the variant index,str
for the variant name, orbytes
for the variant name. But when deserializing, it requests an identifier.The second part is that
ContentRefDeserializer
andContentDeserializer
only pass through string/byte values whendeserialize_identifier
is requested. This method rejects integers fordeserialize_identifier
.When these two facts are combined, it makes it impossible for
InnerEnum
insideUntaggedEnum
to be deserialized in any format which uses variant index rather than name.Background for
Context
:When a
#[serde(untagged)]
enum deserializes data, it first deserializes it into aContent
, then passesContentRefDeserializer
and/orContentDeserialize
to each variant in turn, exiting as soon as one succeeds.I see two ways to fix this - we could either change
#[derive(Deserialize)]
on enums to usedeserialize_any
instead ofdeserialize_identifier
for the variant identifier, or we could changeContentRefDeserializer
andContentDeserializer
to accept integers insidedeserialize_identifier
.I think the second would be strictly less of a breaking change, since it only means things that previously caused an error no longer do. Changing
deserialize_identifier
todeserialize_any
in#[derive(Deserialize)]
would remove some information provided to non-self-describing formats.I'm not an expert on how
serde
tries to be compatible though. If a maintainer has an opinion on which one would be best, I'd be willing to submit a PR to fix this - though it seems like it'll be a small fix either way.I discovered this investigating a failed test in msgpack-rust: 3Hren/msgpack-rust#178.
They are affected by this because the default serialization serializes enum variants as integers, rather than strings. If I've correctly identified the issue, it should apply to all formats which do this, though.
If there's any info I can provide, feel free to ask. Cheers.
The text was updated successfully, but these errors were encountered: