Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve more java field accessor name conflicts #8198

Expand Up @@ -61,8 +61,31 @@
final class DescriptorMessageInfoFactory implements MessageInfoFactory {
private static final String GET_DEFAULT_INSTANCE_METHOD_NAME = "getDefaultInstance";
private static final DescriptorMessageInfoFactory instance = new DescriptorMessageInfoFactory();

/**
* Names that should be avoided (in UpperCamelCase format).
* Using them causes the compiler to generate accessors whose names are
philsttr marked this conversation as resolved.
Show resolved Hide resolved
* colliding with methods defined in base classes.
*
* Keep this list in sync with kForbiddenWordList in
* src/google/protobuf/compiler/java/java_helpers.cc
*/
private static final Set<String> specialFieldNames =
new HashSet<>(Arrays.asList("cached_size", "serialized_size", "class"));
new HashSet<>(Arrays.asList(
// java.lang.Object:
"Class",
// com.google.protobuf.MessageLiteOrBuilder:
"DefaultInstanceForType",
// com.google.protobuf.MessageLite:
"ParserForType",
"SerializedSize",
// com.google.protobuf.MessageOrBuilder:
"AllFields",
"DescriptorForType",
"InitializationErrorString",
"UnknownFields",
// obsolete. kept for backwards compatibility of generated code
"CachedSize"));

// Disallow construction - it's a singleton.
private DescriptorMessageInfoFactory() {}
Expand Down Expand Up @@ -593,21 +616,88 @@ static String getFieldName(FieldDescriptor fd) {
String name = (fd.getType() == FieldDescriptor.Type.GROUP)
? fd.getMessageType().getName()
: fd.getName();
String suffix = specialFieldNames.contains(name) ? "__" : "_";
return snakeCaseToCamelCase(name) + suffix;

// convert to UpperCamelCase for comparison to the specialFieldNames
// (which are in UpperCamelCase)
String upperCamelCaseName = snakeCaseToUpperCamelCase(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think snakeCaseToUpperCamelCase is only ever used for checking against specialFieldNames. Can we just keep specialFieldNames in lower camel case instead? This would simplify our code and IMO is a bit more intuitive anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specialFieldNames are in UpperCamelCase to handle proto fields named _my_field, since _my_field will be MyField when it is converted to camel case (either upper or lower). It seemed more straightforward to keep the special field names in UpperCamelCase for comparison (since it handles proto fields named both my_field, _my_field, and __my_field), rather than adding additional logic to handle proto fields starting with _.

Note the tests that use ForbiddenWordsLeadingUnderscoreMessage test this case (pun intended ;)).


String suffix = specialFieldNames.contains(upperCamelCaseName)
// For field names that match the specialFieldNames,
// append "__" to prevent field accessor method names from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why both __ and _? Is there a conflict if both cases uses a single underscore?

Copy link
Contributor Author

@philsttr philsttr Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single and double underscores referenced here are simply to match the java code that is currently generated by the protoc java compiler. Notice the previous code used the single and double underscore. In other words, my change does not change the existing semantics for single and double underscores.

The protoc java compiler currently (prior to my change) uses a single or double underscore depending on the proto field name.

The protoc java compiler always appends at least one underscore to java field names. This prevents those field names from clashing with java keywords. For example, if the proto field name was int, then the protoc java compiler would generate a java field name of int_.

The protoc java compiler does not include the last trailing underscore in accessor method names for these fields. For example, the getter method for the proto field named int would be getInt().

In addition, the protoc java compiler adds another underscore (for a total of two underscores) to the java field name for names that would result in accessor method names that clash with other method names. For example, if the proto field name was class, the java field name would be class__ (double underscore). And again, the last trailing underscore is not included in the accessor method names. So the getter method for the proto field named class would be getClass_()

This behavior existed prior to my change. My change simply adds more detection of field names that result in accessor method name clashes. Previously, only cached_size, serialized_size, and class were detected. Unfortunately, those are only a subset of the proto field names that would actually cause java assessor method name conflicts. My change adds the complete set of proto field names that cause java accessor method name conflicts.

Again, I'm not changing the semantics of single or double underscores. I am also not going to refactor the protoc compiler to change its current semantics for single and double underscores. I'm simply fixing a bug in the protoc compiler where it was generating non-compilable java code due to missing detection of field names that cause conflicts.

// clashing with other methods.
? "__"
// For other field names, append "_" to prevent field names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should field names here be field method accessor names as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the protoc compiler appends a single underscore specifically to the java field names to prevent clashes with java keywords. The protoc compiler does not include this last undescore in accessor method names.

// from clashing with java keywords.
: "_";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't split the ternary ?: operator across these lines. I found this really hard to read. Possibly this method should be split up, one for handling method names and one for handling field names. As is, I'm finding it very hard to grok. I did not initially understand what it was doing.

Copy link
Contributor Author

@philsttr philsttr Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the ternary operator and improved the comments. As stated in my other comment, I'm not going to refactor the semantics of how the protoc compiler uses single and double underscores. I'm also not going to refactor how accessor method names are currently generated from field names. Both would be great improvements, but they are outside of the scope of this bugfix.

For comparison, take a look at the previous implementation of this method...

static String getFieldName(FieldDescriptor fd) {
String name = (fd.getType() == FieldDescriptor.Type.GROUP)
? fd.getMessageType().getName()
: fd.getName();
String suffix = specialFieldNames.contains(name) ? "__" : "_";
return snakeCaseToCamelCase(name) + suffix;
}

I believe the comments in the new implementation have made it more clear, and are sufficient for this bugfix.

return snakeCaseToLowerCamelCase(name) + suffix;
}

private static String getCachedSizeFieldName(FieldDescriptor fd) {
return snakeCaseToCamelCase(fd.getName()) + "MemoizedSerializedSize";
return snakeCaseToLowerCamelCase(fd.getName()) + "MemoizedSerializedSize";
}

/**
* Converts a snake case string into lower camel case.
*
* <p>Some examples:</p>
* <pre>
* snakeCaseToLowerCamelCase("foo_bar") => "fooBar"
* snakeCaseToLowerCamelCase("foo") => "foo"
* </pre>
*
* @param snakeCase the string in snake case to convert
* @return the string converted to camel case, with a lowercase first character
*/
private static String snakeCaseToLowerCamelCase(String snakeCase) {
return snakeCaseToCamelCase(snakeCase, false);
}

/**
* This method must match exactly with the corresponding function in protocol compiler. See:
* https://github.com/google/protobuf/blob/v3.0.0/src/google/protobuf/compiler/java/java_helpers.cc#L153
* Converts a snake case string into upper camel case.
*
* <p>Some examples:</p>
* <pre>
philsttr marked this conversation as resolved.
Show resolved Hide resolved
* snakeCaseToUpperCamelCase("foo_bar") => "FooBar"
* snakeCaseToUpperCamelCase("foo") => "Foo"
* </pre>
*
* @param snakeCase the string in snake case to convert
* @return the string converted to camel case, with an uppercase first character
*/
private static String snakeCaseToUpperCamelCase(String snakeCase) {
return snakeCaseToCamelCase(snakeCase, true);
}

/**
* Converts a snake case string into camel case.
*
* <p>For better readability, prefer calling either
* {@link #snakeCaseToLowerCamelCase(String)} or {@link #snakeCaseToUpperCamelCase(String)}.</p>
*
* <p>Some examples:</p>
* <pre>
philsttr marked this conversation as resolved.
Show resolved Hide resolved
* snakeCaseToCamelCase("foo_bar", false) => "fooBar"
* snakeCaseToCamelCase("foo_bar", true) => "FooBar"
* snakeCaseToCamelCase("foo", false) => "foo"
* snakeCaseToCamelCase("foo", true) => "Foo"
* snakeCaseToCamelCase("Foo", false) => "foo"
* snakeCaseToCamelCase("fooBar", false) => "fooBar"
* </pre>
*
* <p>This implementation of this method must exactly match the corresponding
* function in the protocol compiler. Specifically, the
* {@code UnderscoresToCamelCase} function in
* {@code src/google/protobuf/compiler/java/java_helpers.cc}.</p>
*
* @param snakeCase the string in snake case to convert
* @param capFirst true if the first letter of the returned string should be uppercase.
* false if the first letter of the returned string should be lowercase.
* @return the string converted to camel case, with an uppercase or lowercase first
* character depending on if {@code capFirst} is true or false, respectively
*/
private static String snakeCaseToCamelCase(String snakeCase) {
private static String snakeCaseToCamelCase(String snakeCase, boolean capFirst) {
philsttr marked this conversation as resolved.
Show resolved Hide resolved
StringBuilder sb = new StringBuilder(snakeCase.length() + 1);
boolean capNext = false;
boolean capNext = capFirst;
for (int ctr = 0; ctr < snakeCase.length(); ctr++) {
char next = snakeCase.charAt(ctr);
if (next == '_') {
Expand Down Expand Up @@ -653,7 +743,7 @@ private static Class<?> getTypeForRepeatedMessageField(Class<?> messageType, Fie

/** Constructs the name of the get method for the given field in the proto. */
private static String getterForField(String snakeCase) {
String camelCase = snakeCaseToCamelCase(snakeCase);
String camelCase = snakeCaseToLowerCamelCase(snakeCase);
StringBuilder builder = new StringBuilder("get");
// Capitalize the first character in the field name.
builder.append(Character.toUpperCase(camelCase.charAt(0)));
Expand All @@ -679,7 +769,7 @@ OneofInfo getOneof(Class<?> messageType, OneofDescriptor desc) {
}

private static OneofInfo newInfo(Class<?> messageType, OneofDescriptor desc) {
String camelCase = snakeCaseToCamelCase(desc.getName());
String camelCase = snakeCaseToLowerCamelCase(desc.getName());
String valueFieldName = camelCase + "_";
String caseFieldName = camelCase + "Case_";

Expand Down
Expand Up @@ -43,10 +43,61 @@ option java_generic_services = true; // auto-added
option java_package = "com.google.protobuf";
option java_outer_classname = "TestBadIdentifiersProto";

message TestMessage {
optional string cached_size = 1;
optional string serialized_size = 2;
optional string class = 3;
// Message with field names using underscores that conflict with accessors in the base message class in java.
// See kForbiddenWordList in src/google/protobuf/compiler/java/java_helpers.cc
message ForbiddenWordsUnderscoreMessage {
// java.lang.Object
optional bool class = 1;
// com.google.protobuf.MessageLiteOrBuilder
optional bool default_instance_for_type = 2;
// com.google.protobuf.MessageLite
optional bool parser_for_type = 3;
optional bool serialized_size = 4;
// com.google.protobuf.MessageOrBuilder
optional bool all_fields = 5;
optional bool descriptor_for_type = 6;
optional bool initialization_error_string = 7;
optional bool unknown_fields = 8;
// obsolete. kept for backwards compatibility of generated code
optional bool cached_size = 9;
}

// Message with field names using leading underscores that conflict with accessors in the base message class in java.
// See kForbiddenWordList in src/google/protobuf/compiler/java/java_helpers.cc
message ForbiddenWordsLeadingUnderscoreMessage {
// java.lang.Object
optional bool _class = 1;
// com.google.protobuf.MessageLiteOrBuilder
optional bool _default_instance_for_type = 2;
// com.google.protobuf.MessageLite
optional bool _parser_for_type = 3;
optional bool _serialized_size = 4;
// com.google.protobuf.MessageOrBuilder
optional bool _all_fields = 5;
optional bool _descriptor_for_type = 6;
optional bool _initialization_error_string = 7;
optional bool _unknown_fields = 8;
// obsolete. kept for backwards compatibility of generated code
optional bool _cached_size = 9;
}

// Message with field names in camel case that conflict with accessors in the base message class in java.
// See kForbiddenWordList in src/google/protobuf/compiler/java/java_helpers.cc
message ForbiddenWordsCamelMessage {
// java.lang.Object
optional bool class = 1;
// com.google.protobuf.MessageLiteOrBuilder
optional bool defaultInstanceForType = 2;
// com.google.protobuf.MessageLite
optional bool serializedSize = 3;
optional bool parserForType = 4;
// com.google.protobuf.MessageOrBuilder:
optional bool initializationErrorString = 5;
optional bool descriptorForType = 6;
optional bool allFields = 7;
optional bool unknownFields = 8;
// obsolete. kept for backwards compatibility of generated code
optional bool cachedSize = 9;
}

message Descriptor {
Expand Down Expand Up @@ -84,7 +135,7 @@ message Deprecated {

optional int32 field1 = 1 [deprecated = true];
optional TestEnum field2 = 2 [deprecated = true];
optional TestMessage field3 = 3 [deprecated = true];
optional ForbiddenWordsUnderscoreMessage field3 = 3 [deprecated = true];
}

message Override {
Expand Down Expand Up @@ -117,32 +168,32 @@ message Double {
}

service TestConflictingMethodNames {
rpc Override(TestMessage) returns (TestMessage);
rpc Override(ForbiddenWordsUnderscoreMessage) returns (ForbiddenWordsUnderscoreMessage);
}

message TestConflictingFieldNames {
enum TestEnum {
UNKNOWN = 0;
FOO = 1;
}
message TestMessage {}
message ForbiddenWordsUnderscoreMessage {}
repeated int32 int32_field = 1;
repeated TestEnum enum_field = 2;
repeated string string_field = 3;
repeated bytes bytes_field = 4;
repeated TestMessage message_field = 5;
repeated ForbiddenWordsUnderscoreMessage message_field = 5;

optional int32 int32_field_count = 11;
optional TestEnum enum_field_count = 12;
optional string string_field_count = 13;
optional bytes bytes_field_count = 14;
optional TestMessage message_field_count = 15;
optional ForbiddenWordsUnderscoreMessage message_field_count = 15;

repeated int32 Int32Field = 21; // NO_PROTO3
repeated TestEnum EnumField = 22; // NO_PROTO3
repeated string StringField = 23; // NO_PROTO3
repeated bytes BytesField = 24; // NO_PROTO3
repeated TestMessage MessageField = 25; // NO_PROTO3
repeated ForbiddenWordsUnderscoreMessage MessageField = 25; // NO_PROTO3

// This field conflicts with "int32_field" as they both generate
// the method getInt32FieldList().
Expand Down
23 changes: 17 additions & 6 deletions src/google/protobuf/compiler/java/java_helpers.cc
Expand Up @@ -66,15 +66,26 @@ namespace {

const char* kDefaultPackage = "";

// Names that should be avoided as field names.
// Names that should be avoided (in UpperCamelCase format).
// Using them will cause the compiler to generate accessors whose names are
// colliding with methods defined in base classes.
// Keep this list in sync with specialFieldNames in
// java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java
const char* kForbiddenWordList[] = {
// message base class:
"cached_size",
"serialized_size",
// java.lang.Object:
"class",
"Class",
// com.google.protobuf.MessageLiteOrBuilder:
"DefaultInstanceForType",
// com.google.protobuf.MessageLite:
"ParserForType",
"SerializedSize",
// com.google.protobuf.MessageOrBuilder:
"AllFields",
"DescriptorForType",
"InitializationErrorString",
"UnknownFields",
// obsolete. kept for backwards compatibility of generated code
"CachedSize",
};

const std::unordered_set<std::string>* kReservedNames =
Expand All @@ -93,7 +104,7 @@ const std::unordered_set<std::string>* kReservedNames =

bool IsForbidden(const std::string& field_name) {
for (int i = 0; i < GOOGLE_ARRAYSIZE(kForbiddenWordList); ++i) {
if (field_name == kForbiddenWordList[i]) {
if (UnderscoresToCamelCase(field_name, true) == kForbiddenWordList[i]) {
return true;
}
}
Expand Down