Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: add new pendingStreamCreated on clientStreamTracer #10014

Merged
merged 4 commits into from Apr 11, 2023

Conversation

YifeiZhuang
Copy link
Contributor

@YifeiZhuang YifeiZhuang commented Apr 4, 2023

b/259730220
go/grpc-java-new-trace-lines

Problem

Direct Path users (Big Table, Spanner) occasionally see spikes of RPC latencies and it is hard to debug the cause due to the lack of information. Users look for more information in tracing. Specifically, it requires a few timestamps in the RPC lifecycle .

Background

Before name resolution returns service configs, the ClientCalls are pendingCalls, the steam creation calls are saved.
After name resolution is done, calls are drained: this triggers stream creation, and at the same time the stream tracer is created from all the stream tracer factories. The stream created may not be real stream, instead it forks to two situations:

  • If a picker is there, a real stream is immediately created from the transport.
  • If the picker is not available yet, a pending stream is created. And creating a real stream is delayed until all transports are ready, i.e. connection is ready, handshake and protocol negotiation completed.

Client transport calls back streamCreate() on stream tracer (via StatsTraceContext).

Solution

  1. Add a new method on client stream tracer. The new method is called when pendingStream is created. When this method is invoked, it indicates picker gets delayed for whichever reason.

public abstract class ClientStreamTracer extends StreamTracer {
public void pendingStreamCreated() {

}

}

  1. Add span annotation in current streamCreated(). It indicates that connection is ready, i.e. protocol negotiation and handshake is done.

cc. @mohanli-ml

@YifeiZhuang YifeiZhuang requested a review from ejona86 April 4, 2023 19:11
@@ -315,6 +315,12 @@ public void streamCreated(Attributes transportAtts, Metadata headers) {
headers.discardAll(tracingHeader);
headers.put(tracingHeader, span.getContext());
}
span.addAnnotation("Real stream created.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing annotation didn't include a period. We should do the same.

Is "real stream" meaningful to users? Should we instead say something like "Transport selected" or "Load balancer provided a transport"? Do we know what string C used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C use "Delayed name resolution complete." and "Delayed LB pick complete."
It is a good idea to use the same descriptions, at least for the this new one.
grpc/grpc#31913


@Override
public void createPendingStream() {
span.addAnnotation("Pending stream created.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't this annotation gains us anything, as this will be the same as the start of the span.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For The start of the span you mean when the stream tracer is created by the census interceptor, right?
That is around the same time as pending stream is created. The difference is that every stream has start of span moment, but only every stream has createPendingStream momemt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, this needs to be added on parent span. Plus creatingPendingStream is not sufficiently mean name resolution delay. Maybe we will add follow up PR about name resolution delay timestamp.

@YifeiZhuang YifeiZhuang merged commit fc4410f into grpc:master Apr 11, 2023
10 checks passed
@YifeiZhuang YifeiZhuang deleted the trace-new-line branch April 11, 2023 20:49
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants