Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

fix(httpjson): handle message derived query params #1784

Merged
merged 9 commits into from Sep 13, 2022

Conversation

diegomarquezp
Copy link
Contributor

@diegomarquezp diegomarquezp commented Aug 31, 2022

Fixes #1783
Some message derived types such as Duration, FieldMask or Int32Value
would not be correctly handled by String.valueOf(). Instead, the
toJson() method is used to make it compliant with the protobuf language
guide

Fixes #1783
Some message derived types such as Duration, FieldMask or Int32Value
would not be correctly handled by String.valueOf(). Instead, the
toJson() method is used to make it compliant with the protobuf languague
guide
@diegomarquezp diegomarquezp requested review from a team as code owners August 31, 2022 21:07
@diegomarquezp diegomarquezp requested review from blakeli0 and removed request for blakeli0 August 31, 2022 21:09
public String toQueryParamValue(Object fieldValue) {
if (fieldValue instanceof GeneratedMessageV3) {
return toJson(((GeneratedMessageV3) fieldValue).toBuilder(), false)
.replaceAll("^\"", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious to me, any documentation mentioned that we need to do this manually? What would it looks like if we don't replace them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to get the actual contents of a string value. For example, a FieldMask would serialize into "a.b.c,a.b", but this replace() will get a.b.c,a.b without quotes. I will see if there is a better way to ensure sanitization.

* @param fieldValue a field value to serialize
*/
public String toQueryParamValue(Object fieldValue) {
if (fieldValue instanceof GeneratedMessageV3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to handle this for all GeneratedMessageV3, this basically means we want to handle any complex types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vam-google I saw that Noah came up with a list of wellknownTypes, do you think this is a full list for supported types in query params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the well known types to the logic. The check for GeneratedMessageV3 in toQueryParamValue will be intersected with serializable message types (e.g. FieldMask) with an extra condition in this if statement - also added a comment line for clarification)

@blakeli0
Copy link
Contributor

blakeli0 commented Sep 2, 2022

FYI, a similar fix is implemented in go.

Some message type objects will be passed as query params. These may have
nested properties that will now be generated as ?&foo.bar=1&foo.baz=2
@diegomarquezp diegomarquezp force-pushed the query-param-serializable branch 5 times, most recently from 6e2dd55 to 0ab76a4 Compare September 6, 2022 17:47
@@ -51,6 +58,29 @@
public class ProtoRestSerializer<RequestT extends Message> {
private final TypeRegistry registry;

// well-known types obtained from
// https://github.com/googleapis/gapic-showcase/blob/fe414784c18878d704b884348d84c68fd6b87466/util/genrest/resttools/populatefield.go#L27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed the list from gapic-showcase but I don't think we should just use it, showcase is only used for testing so the list may get out of date. We should probably go through the protobuf doc and add any type that have special handling to the list. For now, since Go already implemented it, we can probably follow what they did here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, I can update the list of well-known types following the docs list, but I'm curious if there is a better way to programatically obtain a list of well-known types so we can update it as a dependency or so.

We have this documentation entry as a source of well-known types but we may have to manually check from time to time to keep our hardcoded list up to date.
However, I noted that the official definitions such as duration.proto would specify a special dotnet package Google.Protobuf.WellKnownTypes, whereas in java it falls into the common com.google.protobuf package, leaving us with no reliable programatic way to obtain a list of well-known types that would get updated as a dependency. As a last (and bad) source, there is a unit test in the same folder of the duration definition that contains a list of well-known types, but it is a unit test, so I don't think we have too many options

for (Map.Entry<FieldDescriptor, Object> fieldEntry :
((GeneratedMessageV3) fieldValue).getAllFields().entrySet()) {
Object value = fieldEntry.getValue();
putQueryParam(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recursive logic seems complicated and could run into StackOverFlow issue if not careful, also doing all this during runtime could affect the performance for client libraries as well. So now I really think that we should do this traversing all leaf level fields part in the generator instead of gax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

.replaceAll("^\"", "")
.replaceAll("\"$", "");
}
if (fieldValue instanceof ByteString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to handle ByteString separately here?

*/
public String toQueryParamValue(Object fieldValue) {
// This will match with message types that are serializable (e.g. FieldMask)
if (fieldValue instanceof GeneratedMessageV3 && !isNonSerializableMessageValue(fieldValue)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use Message instead of GeneratedMessagev3 in all places since it's more generic. I also found that we could pass an Enum to query params, where Enum is not of Message type, so we may need to handle enum specifically as well.

// https://github.com/googleapis/gapic-showcase/blob/fe414784c18878d704b884348d84c68fd6b87466/util/genrest/resttools/populatefield.go#L27
private static final Set<Class<GeneratedMessageV3>> jsonSerializableMessages =
new HashSet(
Arrays.asList(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effective Java 26: Don't use raw types. To do that and make this immutable at the same time, consider:

  private static final Set<Class<? extends GeneratedMessageV3>> JSON_SERIALIZABLE_MESSAGES =
      ImmutableSet.of(
          BoolValue.class,

Note that the collection could be considered a constant since it is now immutable, thus now worthy of upper snake case.

@@ -127,16 +169,34 @@ public void putPathParam(Map<String, String> fields, String fieldName, Object fi
* @param fieldValue a field value
*/
public void putQueryParam(Map<String, List<String>> fields, String fieldName, Object fieldValue) {
ImmutableList.Builder<String> paramValueList = ImmutableList.builder();
ArrayList<String> paramValueList = new ArrayList();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ArrayList is a raw-type (so, please at minimum change to use the diamond operator new ArrayList<>()) and is changing the contract of this public function to modify the fields output collection to contain a mutable data structure rather than an immutable one. Is this intended, or do we want to keep this immutable?

if (isNonSerializableMessageValue(fieldValue)) {
putDecomposedMessageQueryParam(fields, fieldName, fieldValue);
return;
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working with the pattern:

if (x is Collection)
  foreach i in x
    // do A(i)
else
  // do A(x)

We can often refactor to:

Iterable<?> fieldValues =
    fieldValue instanceof Iterable
        ? (Iterable<?>) fieldValue
        : Collections.singleton(fieldValue);

foreach i in x
  // do A(i)

This eliminates the need to study both implementations of A to determine if they're the same or not, with the added benefit of it being easier to test the single code path rather than both. As the complexity of A increases, this simplification becomes more and more beneficial.

private void putDecomposedMessageQueryParam(
Map<String, List<String>> fields, String fieldName, Object fieldValue) {
for (Map.Entry<FieldDescriptor, Object> fieldEntry :
((GeneratedMessageV3) fieldValue).getAllFields().entrySet()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast is only safe due to an instanceof check that occurs elsewhere in the logic. Will the next maintainer know they must call isNonSerializableMessageValue() as a check before this method?

I would highly suggest either performing the check explicitly in this method, or changing the parameter from Object to GeneratedMessageV3, thus forcing the caller of this method to perform the cast.

In general, the type-check (instanceof) and type-assertion (cast) should always be colocated - as two actions are tightly coupled and will need to change together if one is modified.

paramValueList.add(toQueryParamValue(fieldValueItem));
}
}
if (hasProcessedMessage) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than introducing this variable and a secondary exit point for the function, consider instead just performing a !paramValueList.isEmpty() check at the end to decide whether to add to the final fields map or not.

also fixed best practice issues pointed out in last commit's PR
enums passed as root object to putQueryParam will now be serialized as
their numeric value
blakeli0 and others added 2 commits September 9, 2022 16:25
Also added tests for serializing objects that contain Any typed
messages. Note that the type registry must have the tested types
beforehand, so they were added in the test class setup
expectedFields.put(
"object.options.value.value", Arrays.asList("1.000000001s", "2.000000002s", "a.b.c,d.e.f"));
// used by JSON parser to obtain descriptors from this type url
expectedFields.put(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting edge case, I don't think we should include these type info in the query params, we probably don't want to configure an Any field as query params as well. Can you do a little more investigation on this?

// tests with Any type messages require corresponding descriptors in the type registry
requestSerializer =
ProtoRestSerializer.create(
TypeRegistry.newBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if these descriptors are added to ProtoRestSerializer on generating client libraries? If not, we should added the well knowntypes to the registry as well.

@sonarcloud
Copy link

sonarcloud bot commented Sep 12, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

97.7% 97.7% Coverage
0.0% 0.0% Duplication

@diegomarquezp
Copy link
Contributor Author

diegomarquezp commented Sep 13, 2022

The final approach was to rely on JSON serialization to produce fields to be included in the query paramenters. This allows well-known types such as Duration and Timestamp to be serialized into a string with compliant format, plus allowing complex message types to be serialized as tree/array objects that would be recursively processed and added as query parameters.

As next steps, we must handle enum types as root of the input of putQueryParam as it is now currently being serialized into string representation instead of the correct numeric value.
Also, Any typed messages must be ensured to have all related message types put in the serializer's type registry so the type url can be recognized to ensure proper serialization.

@diegomarquezp diegomarquezp merged commit 4524fad into main Sep 13, 2022
@diegomarquezp diegomarquezp deleted the query-param-serializable branch September 13, 2022 18:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

putQueryParam should handle special serializable types
3 participants