Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting protobuf preserving unknown fields #2655

Open
xiaozhikang0916 opened this issue Apr 26, 2024 · 5 comments
Open

Supporting protobuf preserving unknown fields #2655

xiaozhikang0916 opened this issue Apr 26, 2024 · 5 comments
Labels

Comments

@xiaozhikang0916
Copy link

I would like to support preserving unknown fields for ProtoBuf, here is an issue for discussion beforehand.

Motivation

Basically it is required in protobuf spec, so it needs to be implemented to make kotlinx.serialization a valid protobuf parser.

Basic design

A new helper class will be introduced

interface UnknownFieldsHolder: Map<ProtoId, ProtoField>

Users can add to class property according to their needs:

@Serializable
data class ForwardCompatibleData(
    @ProtoNumber(1) val name: String,
    // No @ProtoNumber needed
    // Maybe introducing a new annotation?
    val unknownFields: UnknownFieldsHolder,
)

Behaviours

  • The ProtoBuf format should keeps all undeclare fields in the UnknownFieldsHolder when deserializing;
  • All data in the UnknownFieldsHolder should be written back to wire as is during serializing;
  • If new fields are added in oneof field, the new value will be stored in UnknownFieldsHolder, leaving the @ProtoOneOf property to be null;
  • Other formats should not deseralize or serialize this property.

Further more

Enumeration

Enums in protobuf can also have unknown values in old version of code, but it is much more easier to store it in sealed interface / objects styles:

sealed interface IPhoneType {
  val value: Int
}
data object WorkPhone: IPhoneType {
  override val value: Int
    get() = 1
}
data object HomePhone: IPhoneType {
  override val value: Int
    get() = 2
}
data class UnknownPhoneType(override val value: Int): IPhoneType

Other formats

Do other formats need similar functionality?

@pdvrieze
Copy link
Contributor

@xiaozhikang0916 XML has this option. Both for tags and attributes (but they need to be stored separately). What you need is:

  1. A way of recording arbitrary Protobuf values (equivalent to JsonElement or xml Node)
  2. A container for unknown elements (it makes a lot of sense to require this to be explicitly annotated)

@xiaozhikang0916
Copy link
Author

A way of recording arbitrary Protobuf values

It is required for this feature, but I will keep it an internal interface, because I believe using protobuf in active code should be restricted in the scope of serializable class, and map-like structure ( with integer key of proto id ) is not dev-friendly.

@pdvrieze
Copy link
Contributor

@xiaozhikang0916 You probably want some way to specify which numbers could be omitted (deprecated fields), or whether to write these values at all (changes in known attributes may change the validity of the opaque unsupported attributes).

@xiaozhikang0916
Copy link
Author

@pdvrieze I got your point, but IMO an option to omit, or ignore any fields ( deprecated or not ) is also a breach of the protobuf spec.

If a deprecated field is not intented to be used by developers, just omit it in the declaration or keep the property internal/private. In anyway, this field should present in the wire date, and the wire data is not accessible to the user.

@pdvrieze
Copy link
Contributor

@xiaozhikang0916 What I'm considering is the case where the data is read, processed and serialized. In some cases it is semantically valid to retain the (extra) data, in others it is not. It may be possible to make a copy of the container without the "unknown field" data to avoid serializing invalid data, but this may be challenging for deeply nested hierarchies.

As a protocol the protobuf specification can not say anything of program semantics. The semantics are per definition program specific, only the developers of the program can decide what the correct way is to handle unknown data. The protobuf specification only states that fields are preserved during parsing and serialization, this is not about reading in data, doing some application specific stuff and then saving it again, rather it is to do with transformation of data through different paths.

As to the deprecated fields, an example there is that you have a deprecated localtime field, as well as a current utcTime field. In the example the localTime is no longer present in the data type. When then the utcTime is updated the localTime is no longer valid (the time changed), but because the application no longer knows about localtime it cannot update this "unknown field". The only correct behaviour in that case is to not emit unknown (but possibly incorrect) fields.

In other cases, say you want to add security headers to a message, it is perfectly valid to just encapsulate an opaque message and preserve all fields, it depends on the context. What I'm suggesting is a way (either using annotations or as a format option) to configure the behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants