Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce interfaces for metrics instrumentation #2403

Merged
merged 31 commits into from Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
2ae69f7
OTEL POC with attempt latency.
blakeli0 Jun 25, 2023
6671379
Add ClientMetricsTracer. Add channel size metric.
blakeli0 Jun 27, 2023
cd9f387
Add response to OperationSucceeded. Add a way to add attributes to me…
blakeli0 Jun 29, 2023
5eccd09
Add metrics for retryCount.
blakeli0 Jul 6, 2023
c2573b6
Add ApiTracerFactory to java-core.
blakeli0 Jul 17, 2023
292beae
chore: Add gRPC metrics for DirectPath.
blakeli0 Aug 4, 2023
9cfd35e
Add GFE metadata metrics. Add status labels.
blakeli0 Aug 22, 2023
4d19614
Add gax thread count metric.
blakeli0 Aug 23, 2023
be5b7b3
Merge branch 'main' into otel-poc
blakeli0 Aug 23, 2023
527985f
Make metric recorders protected.
blakeli0 Aug 24, 2023
56c9999
Add OperationCounter
blakeli0 Sep 1, 2023
d249b66
Add MetricsRecorder
blakeli0 Oct 5, 2023
145ad30
Merge branch 'main' into otel-poc
blakeli0 Dec 12, 2023
5c30048
Merge branch 'main' into otel-poc
blakeli0 Dec 18, 2023
57571b2
Add MetricsRecorder
blakeli0 Jan 2, 2024
3e2a771
Merge branch 'main' into otel-poc
blakeli0 Jan 5, 2024
3f24049
Expose addAdditionalAttributes in ApiTracer.
blakeli0 Jan 9, 2024
09c7c7b
Move OpenTelemetry logics to MetricsRecorder
blakeli0 Jan 9, 2024
d1bf7bd
Simplify the PoC
blakeli0 Jan 16, 2024
fed50c7
feat: Make all methods in ApiTracer default.
blakeli0 Jan 19, 2024
69e8a90
Remove unused dependencies.
blakeli0 Jan 19, 2024
f771d7e
Revert ClientContext changes.
blakeli0 Jan 19, 2024
d714cf4
Introduce MethodName. Add Javadocs.
blakeli0 Jan 20, 2024
7977511
Format
blakeli0 Jan 20, 2024
39d0c69
Add Javadoc to ServiceOptions
blakeli0 Jan 20, 2024
8ff9bb5
Merge branch 'main' into otel-foundation
blakeli0 Jan 20, 2024
0efbeba
Move addAttributes() from ApiTracer to MEtricsTracer.
blakeli0 Jan 22, 2024
6171779
Format
blakeli0 Jan 22, 2024
2678682
Add basic Java docs.
blakeli0 Jan 22, 2024
9e32b79
Add basic Java docs.
blakeli0 Jan 22, 2024
0483500
Update Java docs.
blakeli0 Jan 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -77,7 +77,9 @@ public final class GrpcCallContext implements ApiCallContext {
private static final GrpcStatusCode UNAUTHENTICATED_STATUS_CODE =
GrpcStatusCode.of(Status.Code.UNAUTHENTICATED);

static final CallOptions.Key<ApiTracer> TRACER_KEY = CallOptions.Key.create("gax.tracer");
// This field is made public for handwritten libraries to easily access the tracer from
// CallOptions
public static final CallOptions.Key<ApiTracer> TRACER_KEY = CallOptions.Key.create("gax.tracer");
lqiu96 marked this conversation as resolved.
Show resolved Hide resolved

private final Channel channel;
@Nullable private final Credentials credentials;
Expand Down
Expand Up @@ -49,34 +49,38 @@ public interface ApiTracer {
* between clients using gax and external resources to share the same implementation of the
* tracing. For example OpenCensus will install a thread local that can read by the GRPC.
*/
Scope inScope();
default Scope inScope() {
return () -> {
// noop
};
};

/**
* Signals that the overall operation has finished successfully. The tracer is now considered
* closed and should no longer be used.
*/
void operationSucceeded();
default void operationSucceeded() {};

/**
* Signals that the operation was cancelled by the user. The tracer is now considered closed and
* should no longer be used.
*/
void operationCancelled();
default void operationCancelled() {};

/**
* Signals that the overall operation has failed and no further attempts will be made. The tracer
* is now considered closed and should no longer be used.
*
* @param error the final error that caused the operation to fail.
*/
void operationFailed(Throwable error);
default void operationFailed(Throwable error) {};

/**
* Annotates the operation with selected connection id from the {@code ChannelPool}.
*
* @param id the local connection identifier of the selected connection.
*/
void connectionSelected(String id);
default void connectionSelected(String id) {};

/**
* Adds an annotation that an attempt is about to start. In general this should occur at the very
Expand All @@ -86,7 +90,7 @@ public interface ApiTracer {
* @deprecated Please use {@link #attemptStarted(Object, int)} instead.
*/
@Deprecated
void attemptStarted(int attemptNumber);
default void attemptStarted(int attemptNumber) {};

/**
* Adds an annotation that an attempt is about to start with additional information from the
Expand All @@ -96,64 +100,64 @@ public interface ApiTracer {
* @param attemptNumber the zero based sequential attempt number.
* @param request request of this attempt.
*/
void attemptStarted(Object request, int attemptNumber);
default void attemptStarted(Object request, int attemptNumber) {};

/** Adds an annotation that the attempt succeeded. */
void attemptSucceeded();
default void attemptSucceeded() {};

/** Add an annotation that the attempt was cancelled by the user. */
void attemptCancelled();
default void attemptCancelled() {};

/**
* Adds an annotation that the attempt failed, but another attempt will be made after the delay.
*
* @param error the transient error that caused the attempt to fail.
* @param delay the amount of time to wait before the next attempt will start.
*/
void attemptFailed(Throwable error, Duration delay);
default void attemptFailed(Throwable error, Duration delay) {};

/**
* Adds an annotation that the attempt failed and that no further attempts will be made because
* retry limits have been reached.
*
* @param error the last error received before retries were exhausted.
*/
void attemptFailedRetriesExhausted(Throwable error);
default void attemptFailedRetriesExhausted(Throwable error) {};

/**
* Adds an annotation that the attempt failed and that no further attempts will be made because
* the last error was not retryable.
*
* @param error the error that caused the final attempt to fail.
*/
void attemptPermanentFailure(Throwable error);
default void attemptPermanentFailure(Throwable error) {};

/**
* Signals that the initial RPC for the long running operation failed.
*
* @param error the error that caused the long running operation fail.
*/
void lroStartFailed(Throwable error);
default void lroStartFailed(Throwable error) {};

/**
* Signals that the initial RPC successfully started the long running operation. The long running
* operation will now be polled for completion.
*/
void lroStartSucceeded();
default void lroStartSucceeded() {};

/** Adds an annotation that a streaming response has been received. */
void responseReceived();
default void responseReceived() {};

/** Adds an annotation that a streaming request has been sent. */
void requestSent();
default void requestSent() {};

/**
* Adds an annotation that a batch of writes has been flushed.
*
* @param elementCount the number of elements in the batch.
* @param requestSize the size of the batch in bytes.
*/
void batchRequestSent(long elementCount, long requestSize);
default void batchRequestSent(long elementCount, long requestSize) {};

/**
* A context class to be used with {@link #inScope()} and a try-with-resources block. Closing a
Expand Down
Expand Up @@ -33,7 +33,10 @@
import org.threeten.bp.Duration;

/**
* A base implementation of {@link ApiTracer} that does nothing.
* A base implementation of {@link ApiTracer} that does nothing. With the deprecation of Java 7
* support, all the methods in {@link ApiTracer} are now made default, we no longer need a base
* class that does nothing. This class should be removed once all the references to it are removed
* in Google Cloud Client Libraries.
lqiu96 marked this conversation as resolved.
Show resolved Hide resolved
*
* <p>For internal use only.
*/
Expand Down
@@ -0,0 +1,65 @@
/*
* Copyright 2024 Google LLC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Google LLC nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
package com.google.api.gax.tracing;

import com.google.api.core.BetaApi;
import com.google.api.core.InternalApi;
import com.google.api.gax.rpc.StubSettings;
import com.google.auto.value.AutoValue;

/** A value class to represent the name of the RPC method in an {@link ApiTracer}. */
@BetaApi
@InternalApi
@AutoValue
public abstract class MethodName {
/**
* Creates a new instance of the RPC method name.
*
* @param serviceName The name of the service. In general this will be GAPIC generated service
* name {@link StubSettings#getServiceName()}. However, in some cases, when the GAPIC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakeli0 I am now having some second thoughts on the serviceName() getter's name. I think there might be some slight discrepancy between the intended result and the function name.

getServiceName() returns the name that was configured in the default_host. For example, asset's will return cloudasset: https://github.com/googleapis/google-cloud-java/blob/9785f8cfd43db0a8968f086a1461242da9d16cb5/java-asset/google-cloud-asset/src/main/java/com/google/cloud/asset/v1/stub/AssetServiceStubSettings.java#L866-L868

even though the intended result for this probably should be asset.

Not something blocking this PR, but just wondering your thoughts on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lqiu96 why should the intended name be asset ?
@blakeli0 Is there any way we could test this? that for a particular service, this is the intented serviceName ?

Copy link
Contributor

@lqiu96 lqiu96 Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lqiu96 why should the intended name be asset ? @blakeli0 Is there any way we could test this? that for a particular service, this is the intented serviceName ?

I don't want to say what the intended name should be. I think it might be a bit weird for customers since we probably aren't providing consistent service names.

java-asset's serviceName is cloudasset (link above)
java-biglake's serviceName is biglake (https://github.com/googleapis/google-cloud-java/blob/458516b462e0a4494f32815fe9d0e6f0b30353f1/java-biglake/google-cloud-biglake/src/main/java/com/google/cloud/bigquery/biglake/v1/stub/MetastoreServiceStubSettings.java#L409-L411)

I think most services would have the service name of java-{serviceName} except for asset and possibly a few others.

It might not be a big concern after all, but something just doesn't sit right with me knowing that serviceName is parsed from a URI (default_host value) and not the package name or some config value (name_pretty or api name).

Copy link
Contributor

@ddixit14 ddixit14 Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent sometime while working on the dashboard on service names, and they are not of the type "java-{serviceName}". See column A in this sheet. In the dashboard too, we are deriving the service name from the default_host value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, your link doesn't work for me. Seems like there is no defined serviceName and we've somehow settled on the same definition 😆

Not blocking this PR, but it would be great if there was an official definition for serviceName. Perhaps a question for the core team.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lqiu96 I think the term serviceName you introduced still makes sense, because we call the yaml and json config for each versioned protos service yaml and service config respectively. we may want to add more docs to explain that it is used for constructing service endpoints.
@ddixit14 I would call java-{serviceName} a library name or repo name, as it could includes multiple versioned services or admin services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, to wrap up the conversation above: I'll add some additional docs clarifying the intended use case (constructing the endpoint) and how the this value is constructed (from the config files).

It is fine that serviceName in the library (pulled from getServiceName()) doesn't 100% match java-{serviceName}.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine that serviceName in the library (pulled from getServiceName()) doesn't 100% match java-{serviceName}.

Correct. The library/repo name may not always be in the format of java-{serviceName} and I would not refer java-{serviceName} as the format for library/repo name.

* generated service is wrapped, this will be overridden to specify the manually written
* wrapper's name.
* @param methodName The name of the logical operation being traced.
*/
public static MethodName of(String serviceName, String methodName) {
return new AutoValue_MethodName(serviceName, methodName);
}

/** The name of the service. ie BigtableData */
public abstract String getServiceName();

/** The name of the logical operation being traced. ie. ReadRows. */
public abstract String getMethodName();

@Override
public String toString() {
return getServiceName() + "." + getMethodName();
}
}
@@ -0,0 +1,61 @@
/*
* Copyright 2024 Google LLC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Google LLC nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

package com.google.api.gax.tracing;

import com.google.api.core.BetaApi;
import com.google.api.core.InternalApi;
import java.util.Map;

/**
* Provides an interface for metrics recording. The implementer is expected to use an observability
* framework, e.g. OpenTelemetry. There should be only one instance of MetricsRecorder per client,
* all the methods in this class are expected to be called from multiple threads, hence the
* implementation must be thread safe.
*/
@BetaApi
lqiu96 marked this conversation as resolved.
Show resolved Hide resolved
@InternalApi
public interface MetricsRecorder {

/** Records the latency of an RPC attempt */
default void recordAttemptLatency(double attemptLatency, Map<String, String> attributes) {}

/** Records the count of RPC attempts */
default void recordAttemptCount(long count, Map<String, String> attributes) {}

/**
* Records the total end-to-end latency for an operation, including the initial RPC attempts and
* subsequent retries.
*/
default void recordOperationLatency(double operationLatency, Map<String, String> attributes) {}

/** Records the count of operations */
default void recordOperationCount(long count, Map<String, String> attributes) {}
}
@@ -0,0 +1,53 @@
/*
* Copyright 2024 Google LLC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Google LLC nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

package com.google.api.gax.tracing;

import com.google.api.core.BetaApi;
import com.google.api.core.InternalApi;

/**
* This class computes generic metrics that can be observed in the lifecycle of an RPC operation.
* The responsibility of recording metrics should delegate to {@link MetricsRecorder}, hence this
* class should not have any knowledge about the observability framework used for metrics recording.
*/
@BetaApi
@InternalApi
public class MetricsTracer implements ApiTracer {

public MetricsTracer(MethodName methodName, MetricsRecorder metricsRecorder) {}

/**
* Add attributes that will be attached to all metrics. This is expected to be called by
* handwritten client teams to add additional attributes that are not supposed be collected by
* Gax.
*/
public void addAttributes(String key, String value) {};
}
@@ -0,0 +1,57 @@
/*
* Copyright 2024 Google LLC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Google LLC nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
package com.google.api.gax.tracing;

import com.google.api.core.BetaApi;
import com.google.api.core.InternalApi;

/**
* A {@link ApiTracerFactory} to build instances of {@link MetricsTracer}.
*
* <p>This class wraps the {@link MetricsRecorder} and pass it to {@link MetricsTracer}. It will be
* used to record metrics in {@link MetricsTracer}.
*
* <p>This class is expected to be initialized once during client initialization.
*/
@BetaApi
@InternalApi
public class MetricsTracerFactory implements ApiTracerFactory {
protected MetricsRecorder metricsRecorder;

public MetricsTracerFactory(MetricsRecorder metricsRecorder) {
this.metricsRecorder = metricsRecorder;
}

@Override
public ApiTracer newTracer(ApiTracer parent, SpanName spanName, OperationType operationType) {
return new MetricsTracer(
MethodName.of(spanName.getClientName(), spanName.getMethodName()), metricsRecorder);
}
}