Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

editions: map fields and message_encoding and field_presence features #16549

Closed
jhump opened this issue Apr 17, 2024 · 4 comments
Closed

editions: map fields and message_encoding and field_presence features #16549

jhump opened this issue Apr 17, 2024 · 4 comments
Assignees

Comments

@jhump
Copy link
Contributor

jhump commented Apr 17, 2024

This is almost more of a question than a bug, but it could indicate a bug.

Currently (on main branch and in latest v26.1 release), a file that uses editions is not allowed to set the message_encoding feature on a map field, regardless of the value:

// test-map-delimited.proto
edition = "2023";
message Foo {
  map<string, string> m = 1 [features.message_encoding=LENGTH_PREFIXED];
}
$  protoc test-map-delimited.proto -o /dev/null
test-map-delimited.proto:4:23: Only message fields can specify message encoding.

Under the hood, the descriptor for this field looks like a repeated message field, but the check that reports that error message has an explicit exception for maps.

Despite this limitation, if I set DELIMITED as the file default using a file option, then the type() accessor for the map field reports TYPE_GROUP instead of TYPE_MESSAGE. And if the value of the map is a message type, that synthesized value field in the map entry message also reports TYPE_GROUP instead of TYPE_MESSAGE.

And the actual serialization/deserialization respects the type() accessor, using "start group" and "end group" wire type tags when serializing the map data... or at least protoc --encode ... does this (which could be a bug in the dynamic message implementation if it is not supposed to).

So the first question is: Is it intended that map fields can be serialized using delimited message encoding? If so, why does the compiler prevent one from setting that feature on a map field? If not, I take it the behavior observed with protoc --encode ... is a bug?


The next thing to bring up is related, but involves the has_presence() method of a field descriptor.

The compiler also prevents a user from setting the field_presence option on any repeated field, and that includes maps. The actual map fields, because they are repeated, always report false from has_presence(). However, the synthetic fields of the map entry message report whatever the file default is. So they report true of "has_presence()in "proto2" files as well as in editions file, as long as the default presence has not been overridden to beIMPLICIT`. (The exception being a map value field always reports true, even in proto3 or when the default presence is overridden in editions, simply because it is a map field.)

Also looking at protoc --encode ... and protoc --decode ..., the presence of a map key or value in fact seems to always be false. Regardless of whether you specify a key or value in input, you seem to always get a value. This is easy to see by encoding the following text format and then decoding it: the decode result always shows the keys and values set, even if they were absent in the input.

// working with same Foo message defined in example source above
m{
  key: "abc"
  value: "xyz"
}
m{
  value: "123"
}
m{
  key: "ABC"
}
// decode output adds missing keys and values
m {
  key: ""
  value: "123"
}
m {
  key: "ABC"
  value: ""
}
m {
  key: "abc"
  value: "xyz"
}

This could be a side effect of the dynamic message implementation and how it represents map fields, but I've observed the same behavior in the Go protobuf runtime, not just in the text format but also in the binary format. (I suspect the C++ implementation of the binary format behaves the same way.)

So my second question is: should the key and value fields of a synthetic map entry message always report false for has_presence()? Based on observable behavior, I think they should. Even when the map value is a message, the value field of the map entry behaves more like a repeated field in this regard -- failing to serialize a value does not result in a recipient observing an absent value for a map entry, but rather they observe a present-but-empty message.

@jhump jhump added the untriaged auto added to all issues by default when created. label Apr 17, 2024
@mkruskal-google
Copy link
Member

mkruskal-google commented Apr 18, 2024

Hey Josh, thanks for reporting this!

Our intention in edition 2023 was to basically punt on map fields and preserve whatever the behavior in proto2/proto3 was as best as we could. Since group maps weren't possible in proto2, we intended to ban tag-delimited maps. I think it's a an oversight that they can inherit this feature from the file, but maybe we should just open this up and remove that special-case instead of ignoring it.

For presence I think I agree that this is a proto2 "bug" in has_presence that's been inherited by editions. Map key/value fields really don't have presence, and has_presence should report false. I can see if this is feasible to add to 27.0, although if it's a proto2 breaking change it might have to wait..

copybara-service bot pushed a commit that referenced this issue Apr 19, 2024
Map fields should remain length-prefixed for now, even if DELIMITED is inherited.  Field presence will remain unchanged, but unit-tests are added to make sure proto2/proto3 behaviors stay consistent.

Closes #16549

PiperOrigin-RevId: 626462967
copybara-service bot pushed a commit that referenced this issue Apr 19, 2024
Map fields should remain length-prefixed for now, even if DELIMITED is inherited.  Field presence will remain unchanged, but unit-tests are added to make sure proto2/proto3 behaviors stay consistent.

Closes #16549

PiperOrigin-RevId: 626462967
copybara-service bot pushed a commit that referenced this issue Apr 19, 2024
Map fields should remain length-prefixed for now, even if DELIMITED is inherited.  Field presence will remain unchanged, but unit-tests are added to make sure proto2/proto3 behaviors stay consistent.

Closes #16549

PiperOrigin-RevId: 626462967
@zhangskz zhangskz added 27.x and removed untriaged auto added to all issues by default when created. labels Apr 24, 2024
@jhump
Copy link
Contributor Author

jhump commented Apr 29, 2024

@mkruskal-google, FWIW, I think we found another bug related to inheritance of features from file defaults: #16664

copybara-service bot pushed a commit that referenced this issue May 1, 2024
Map fields should remain length-prefixed for now, even if DELIMITED is inherited.  Field presence will remain unchanged, but unit-tests are added to make sure proto2/proto3 behaviors stay consistent.

Closes #16549

PiperOrigin-RevId: 626462967
@mkruskal-google
Copy link
Member

mkruskal-google commented May 2, 2024

To follow up on this, I think we're going to leave the has_presence behavior alone for 2023. It's awkward and confusing, but it's consistent with proto2/proto3 behavior. For the delimited bug, we'll fix the feature handling so that maps don't inherit this setting. Thanks for reporting these!

@jhump
Copy link
Contributor Author

jhump commented May 2, 2024

For the delimited bug, we'll fix the feature handling so that maps don't inherit this setting.

@mkruskal-google, thanks for the update! We'll follow suit in protobuf-es.

cc @timostamm

mkruskal-google added a commit to mkruskal-google/protobuf that referenced this issue May 2, 2024
Map fields should remain length-prefixed for now, even if DELIMITED is inherited.  Field presence will remain unchanged, but unit-tests are added to make sure proto2/proto3 behaviors stay consistent.

Closes protocolbuffers#16549

PiperOrigin-RevId: 630191163
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants