Regression in 3.12: toString() returns incorrect ascii when there are duplicate keys in a map #7505

thesamet · 2020-05-14T06:29:41Z

Prior to protobuf 3.12, when parsing a map using TextFormat:

If there are duplicate map keys, the last key seen is used.
toString emits the duplicate keys in the original order, so if the ascii representation was parsed again, the same last key will be used, and the original message will be preserved.

In 3.12.0-rc-1 and 3.12.0-rc-2:

The last key seen is used.
toString does not retain the original order. In some cases, the last value printed for a key, is not necessarily the value that is being used in a message. As a result, parsing again the output of toString results in a different message.

In other words, m.toString() is producing a text representation that when parsed does not give the original m back.

What version of protobuf and what language are you using?
Version: 3.12.0-rc-2
Language: Java
What operating system (Linux, Windows, ...) and version?
Ubuntu 19.10

What did you do?

np.proto:

syntax = "proto3";

package mypkg;

message Msg {
  map<int32, int32> m = 1;
}

Main.java:

import mypkg.Np.Msg;

import com.google.protobuf.TextFormat;
import com.google.protobuf.util.JsonFormat;

import java.lang.String;

public class Main {
    public static void main(String[] args) throws Exception {
            String input =
            "m: {\n" +
            "  key: 1\n" +
            "  value: 1\n" +
            "}\n" +
            "m: {\n" +
            "  key: -2147483647\n" +
            "  value: 5\n" +
            "}\n" +
            "m: {\n" +
            "  key: 1\n" +
            "  value: -1\n" +
            "}\n";

            Msg p = TextFormat.parse(input, Msg.class);
            int i1 = p.getMMap().get(1);
            Msg p2 = TextFormat.parse(p.toString(), Msg.class);
            int i2 = p2.getMMap().get(1);
            System.out.println(i1);  // prints -1
            System.out.println(i2);  // prints 1, expected -1
    }
}

What did you expect to see
Either a failure to parse, or:

-1
-1

What did you see instead?

1
-1

The text was updated successfully, but these errors were encountered:

haberman · 2020-05-14T17:51:03Z

What is the output of the p.toString() call here?

haberman · 2020-05-14T18:11:45Z

I was able to reproduce, I see the output:

int32_to_int32_field {
  key: 1
  value: -1
}
int32_to_int32_field {
  key: -2147483647
  value: 5
}
int32_to_int32_field {
  key: 1
  value: 1
}

haberman · 2020-05-14T19:16:49Z

I think the subtraction here is overflowing:
https://github.com/protocolbuffers/protobuf/blob/master/java/core/src/main/java/com/google/protobuf/TextFormat.java#L513

If I change this to use Integer.compare((int) getKey(), (int) b.getKey()) your test passes.

Thanks for the report. I'll make sure this is fixed before 3.12.0 is released.

Fixes: protocolbuffers#7505

Fixes: #7505

haberman · 2020-05-16T00:19:30Z

The fix is released in 3.12.0.

* Added background information about proto3 presence. (#7501) * Fixed bug in map key sorting for Java TextFormat. (#7508) Fixes: #7505 * Update protobuf version * Added a changelog entry about the Java fix. (#7516)

* Added background information about proto3 presence. (#7501) * Fixed bug in map key sorting for Java TextFormat. (#7508) Fixes: protocolbuffers/protobuf#7505 * Update protobuf version * Added a changelog entry about the Java fix. (#7516)

haberman added a commit to haberman/protobuf that referenced this issue May 14, 2020

Fixed bug in map key sorting for Java TextFormat.

5608c3d

Fixes: protocolbuffers#7505

haberman mentioned this issue May 14, 2020

Fixed bug in map key sorting for Java TextFormat. #7508

Merged

haberman closed this as completed May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in 3.12: toString() returns incorrect ascii when there are duplicate keys in a map #7505

Regression in 3.12: toString() returns incorrect ascii when there are duplicate keys in a map #7505

thesamet commented May 14, 2020

haberman commented May 14, 2020

haberman commented May 14, 2020

haberman commented May 14, 2020

haberman commented May 16, 2020

Regression in 3.12: toString() returns incorrect ascii when there are duplicate keys in a map #7505

Regression in 3.12: toString() returns incorrect ascii when there are duplicate keys in a map #7505

Comments

thesamet commented May 14, 2020

haberman commented May 14, 2020

haberman commented May 14, 2020

haberman commented May 14, 2020

haberman commented May 16, 2020