Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enums inside untagged enums cannot decode in formats which use variant index rather than name #1437

Closed
daboross opened this issue Dec 5, 2018 · 0 comments

Comments

@daboross
Copy link
Contributor

daboross commented Dec 5, 2018

In any format which encodes an enum as its index rather than it's name, if an enum is used within data for an untagged enum, the deserialization fails.

Example of what I mean:

#[derive(Serialize, Deserialize, Debug, PartialEq)]
#[serde(untagged)]
enum UntaggedEnum {
    A(InnerEnum),
}

#[derive(Serialize, Deserialize, Debug, PartialEq)]
enum InnerEnum {
    B,
    C,
}

UntaggedEnum::A(InnerEnum::B) will succeed a round-trip in formats which encode enum variants as string names, but it will fail in formats which use the variant index.

This was introduced in e1db820#diff-751ea35094a96d89c2d04ff6a61e703aR2139 and serde v1.39.0.


The actual problem is two-fold.

First, when InnerEnum deserializes it's enum variant, it uses deserializer.deserialize_identifier:

impl<'de> _serde::Deserialize<'de> for __Field {                                      
    #[inline]                                                                         
    fn deserialize<__D>(__deserializer: __D) -> _serde::export::Result<Self, __D::Error>
    where                                                                               
        __D: _serde::Deserializer<'de>,                                                 
    {                                                                                   
        _serde::Deserializer::deserialize_identifier(__deserializer, __FieldVisitor)    
    }                                                                                   
} 

The generated __FieldVisitor accepts u64 for the variant index, str for the variant name, or bytes for the variant name. But when deserializing, it requests an identifier.

The second part is that ContentRefDeserializer and ContentDeserializer only pass through string/byte values when deserialize_identifier is requested. This method rejects integers for deserialize_identifier.

When these two facts are combined, it makes it impossible for InnerEnum inside UntaggedEnum to be deserialized in any format which uses variant index rather than name.

Background for Context:

When a #[serde(untagged)] enum deserializes data, it first deserializes it into a Content, then passes ContentRefDeserializer and/or ContentDeserialize to each variant in turn, exiting as soon as one succeeds.


I see two ways to fix this - we could either change #[derive(Deserialize)] on enums to use deserialize_any instead of deserialize_identifier for the variant identifier, or we could change ContentRefDeserializer and ContentDeserializer to accept integers inside deserialize_identifier.

I think the second would be strictly less of a breaking change, since it only means things that previously caused an error no longer do. Changing deserialize_identifier to deserialize_any in #[derive(Deserialize)] would remove some information provided to non-self-describing formats.

I'm not an expert on how serde tries to be compatible though. If a maintainer has an opinion on which one would be best, I'd be willing to submit a PR to fix this - though it seems like it'll be a small fix either way.


I discovered this investigating a failed test in msgpack-rust: 3Hren/msgpack-rust#178.

They are affected by this because the default serialization serializes enum variants as integers, rather than strings. If I've correctly identified the issue, it should apply to all formats which do this, though.

If there's any info I can provide, feel free to ask. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants