Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/opentelemetry: Add OpenTelemetry instrumentation component #7066

Closed

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Mar 29, 2024

This PR implements the gRPC-Go instrumentation component for OpenTelemetry, as outlined in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md.

RELEASE NOTES:

  • stats/opentelemetry: Add OpenTelemetry instrumentation component

@zasweq zasweq requested a review from dfawley March 29, 2024 02:59
@zasweq zasweq requested a review from yashykt March 29, 2024 03:00
@zasweq zasweq added the Type: Feature New features or improvements in behavior label Mar 29, 2024
@zasweq zasweq modified the milestones: 1.63 Release, 1.64 Release Mar 29, 2024
@zasweq zasweq force-pushed the opentelemetry-instrumentation-component branch from a417898 to d60d637 Compare March 29, 2024 03:02
Copy link

codecov bot commented Mar 29, 2024

Codecov Report

Merging #7066 (9597843) into master (ba1bf9e) will decrease coverage by 1.23%.
Report is 35 commits behind head on master.
The diff coverage is n/a.

❗ Current head 9597843 differs from pull request most recent head 1f6a60c. Consider uploading reports for the commit 1f6a60c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7066      +/-   ##
==========================================
- Coverage   82.39%   81.17%   -1.23%     
==========================================
  Files         305      345      +40     
  Lines       31389    33927    +2538     
==========================================
+ Hits        25864    27541    +1677     
- Misses       4463     5213     +750     
- Partials     1062     1173     +111     

see 66 files with indirect coverage changes

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review the tests on the next pass; wanted to get this pass out ASAP.

stats/opentelemetry/opentelemetry.go Outdated Show resolved Hide resolved
stats/opentelemetry/opentelemetry.go Outdated Show resolved Hide resolved
stats/opentelemetry/opentelemetry.go Outdated Show resolved Hide resolved
Comment on lines 42 to 43
// EmptyMetrics represents no metrics. To start from a clean slate if want to
// pick a subset of metrics, use this and add onto it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if want to" - please fix. Or even just delete this bit. An example for how to enable/disable metrics would be more useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to "// EmptyMetrics represents no metrics. To start from a clean slate if the
// intended effect is to pick a subset of metrics, use this and add onto it."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm let me look into this Example test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this once we nail down this API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping - it will be nice to show examples of initializing: with metrics disabled, with 1-2 metrics enabled, with the default set of metrics enabled, and with 1-2 metrics disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

stats/opentelemetry/opentelemetry.go Outdated Show resolved Hide resolved
stats/opentelemetry/client_metrics.go Outdated Show resolved Hide resolved
case *stats.Begin:
ci := getCallInfo(ctx)
if ci == nil {
// Shouldn't happen, set by interceptor, defensive programming. Log it won't record?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah log.Error SGTM. And in HandleRPC above, processRPCEnd below, etc.

And everywhere if you log.Error with "___ unexpected ___" then you shouldn't need a comment to double-explain it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I'll add errors in logs. I'll leave the comment that it shouldn't happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

stats/opentelemetry/client_metrics.go Outdated Show resolved Hide resolved
Comment on lines 255 to 261
metrics: map[string]bool{
"grpc.client.attempt.started": true,
"grpc.client.attempt.duration": true,
"grpc.client.attempt.sent_total_compressed_message_size": true,
"grpc.client.attempt.rcvd_total_compressed_message_size": true,
"grpc.client.call.duration": true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Metrics.Empty.Add()? (Or create a NewMetrics(...)?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather do the NewMetrics, but I think the decided API restricts the operations of construction enough (can't just pass in a list[], you have a default list you scale up and down by single metrics) that we should do a metrics.Empty.Add() here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add has, and NewMetrics can have a variadic parameter. So this could be NewMetrics("grpc.client.attempt.started", "grpc.client.attempt.duration", ....).

Please use consts for these metric strings (as above). I think we can just delete EmptyMetrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NewMetrics having a variadic parameter breaks the API that we discussed...Add Remove or Empty() than add. I understand what you're going for, but in that case I might as well have passed a slice of metrics to initialize like I originally wanted but they pushed back wrt API on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how EmptyMetrics.Add(...) is any different than NewMetrics(...) except that one is much more obvious how to use it and doesn't have the weird loophole of someone doing (otel.EmptyMetrics = <something else>).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline; switched.

stats/opentelemetry/server_metrics.go Show resolved Hide resolved
@dfawley dfawley assigned zasweq and unassigned dfawley Apr 1, 2024
@zasweq zasweq assigned dfawley and unassigned zasweq Apr 2, 2024
Comment on lines 247 to 251
var DefaultClientMetrics = *EmptyMetrics.Add("grpc.client.attempt.started").
Add("grpc.client.attempt.duration").
Add("grpc.client.attempt.sent_total_compressed_message_size").
Add("grpc.client.attempt.rcvd_total_compressed_message_size").
Add("grpc.client.call.duration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please export consts for the metric names (with type MetricName).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

stats/opentelemetry/opentelemetry.go Outdated Show resolved Hide resolved
Comment on lines 229 to 232
// Users of this component should use these bucket boundaries as part of their
// SDK MeterProvider passed in. This component sends this as "advice" to the
// API, which works, however this stability is not guaranteed, so for safety the
// SDK Meter Provider should set these bounds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove any lingering duplication from this where it's also repeated below. E.g. the bit about how users should set it in their SDK MeterProvider.

Comment on lines 105 to 108
// the string "other". If unset, will use the method string as is. This is
// intended to be used only for generic methods (see
// grpc.UnknownServiceHandler), and not registered static methods (from
// generated service protos).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like Set this to record the method name of RPCs handled by grpc.UnknownServiceHandler, but take care to limit the values allowed, as allowing too many will increase cardinality and could cause severe memory or performance problems. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appended "Take care to limit the values allowed, as allowing too many will increase cardinality and could cause severe memory or performance problems." If you feel strongly about switching the whole comment, let me know, I felt like it was important to call out generic vs. static.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is settable for both the server and client-side stats handler, right?

Is it only honored on server-side? If so, that should be included.

And then the bit about "from generated service protos" is too specific, since you can register handlers without using protobuf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I'll switch over to your comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait no this is client and server side. Client side you also can want a knob for filtering arbitrary method names. Generated service protos applies to client side, server side is registered protos + registered handlers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client side you also can want a knob for filtering arbitrary method names.

I thought that was what the call option was for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/grpc/proposal/blob/master/A66-otel-stats.md calls for the flow to be something like -
Is the method registered/known?
-- If it's registered/known, record it
-- If it's not, use the method filter to record it.

But, yeah, in your case, you can achieve this with the knob since the knob is available to the users. Your call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per offline discussion, we decided to go the route you mentioned server side (really no other option). On client side, generic methods also go through StaticMethodCallOption exposed to users (i.e. Invoke() or NewStream()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote a new documentation comment.

Comment on lines +165 to +169
// rpcInfo is RPC information scoped to the RPC attempt life span client side,
// and the RPC life span server side.
type rpcInfo struct {
mi *metricsInfo
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rpcInfo is meant to be per-attempt then it should probably be attemptInfo instead.

@zasweq zasweq assigned zasweq and unassigned dfawley Apr 5, 2024
return
}

meter := csh.mo.MeterProvider.Meter("gRPC-Go")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok just saw this section. Added version to both client and server side. I remember talking last year about how this name is a no-op and trying to figure out what to do with this.

}

meter := csh.mo.MeterProvider.Meter("gRPC-Go")
if meter == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior over here? Can this actually happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the no-op meter provider thingy. It just doesn't record metrics. I don't think this can actually happen, this is mainly for defensive programming.

clientMetrics := clientMetrics{}

if _, ok := setOfMetrics["grpc.client.attempt.started"]; ok {
asc, err := meter.Int64Counter("grpc.client.attempt.started", metric.WithUnit("attempt"), metric.WithDescription("The total number of RPC attempts started, including those that have not completed."))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - "Number of client call attempts started"
Unit - "{attempt}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if _, ok := setOfMetrics["grpc.client.attempt.duration"]; ok {
cad, err := meter.Float64Histogram("grpc.client.attempt.duration", metric.WithUnit("s"), metric.WithDescription("End-to-end time taken to complete an RPC attempt including the time it takes to pick a subchannel."), metric.WithExplicitBucketBoundaries(DefaultLatencyBounds...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - "End-to-end time taken to complete a client call attempt"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if _, ok := setOfMetrics["grpc.client.attempt.sent_total_compressed_message_size"]; ok {
cas, err := meter.Int64Histogram("grpc.client.attempt.sent_total_compressed_message_size", metric.WithUnit("By"), metric.WithDescription("Total bytes (compressed but not encrypted) sent across all request messages (metadata excluded) per RPC attempt; does not include grpc or transport framing bytes."), metric.WithExplicitBucketBoundaries(DefaultSizeBounds...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - "Compressed message bytes sent per client call attempt"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


serverMetrics := serverMetrics{}
if _, ok := setOfMetrics["grpc.server.call.started"]; ok {
scs, err := meter.Int64Counter("grpc.server.call.started", metric.WithUnit("call"), metric.WithDescription("The total number of RPCs started, including those that have not completed."))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - "Number of server calls started"
Unit - "{call}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if _, ok := setOfMetrics["grpc.server.call.sent_total_compressed_message_size"]; ok {
ss, err := meter.Int64Histogram("grpc.server.call.sent_total_compressed_message_size", metric.WithUnit("By"), metric.WithDescription("Total bytes (compressed but not encrypted) sent across all response messages (metadata excluded) per RPC; does not include grpc or transport framing bytes."), metric.WithExplicitBucketBoundaries(DefaultSizeBounds...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - "Compressed message bytes sent per server call"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if _, ok := setOfMetrics["grpc.server.call.rcvd_total_compressed_message_size"]; ok {
sr, err := meter.Int64Histogram("grpc.server.call.rcvd_total_compressed_message_size", metric.WithUnit("By"), metric.WithDescription("Total bytes (compressed but not encrypted) received across all request messages (metadata excluded) per RPC; does not include grpc or transport framing bytes."), metric.WithExplicitBucketBoundaries(DefaultSizeBounds...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - "Compressed message bytes received per server call"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if _, ok := setOfMetrics["grpc.server.call.duration"]; ok {
scd, err := meter.Float64Histogram("grpc.server.call.duration", metric.WithUnit("s"), metric.WithDescription("This metric aims to measure the end2end time an RPC takes from the server transport’s perspective."), metric.WithExplicitBucketBoundaries(DefaultLatencyBounds...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - End-to-end time taken to complete a call from server transport's perspective

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if _, ok := setOfMetrics["grpc.client.call.duration"]; ok {
ccs, err := meter.Float64Histogram("grpc.client.call.duration", metric.WithUnit("s"), metric.WithDescription("This metric aims to measure the end-to-end time the gRPC library takes to complete an RPC from the application’s perspective."), metric.WithExplicitBucketBoundaries(DefaultLatencyBounds...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description - "Time taken by gRPC to complete an RPC from application's perspective"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zasweq zasweq assigned yashykt and dfawley and unassigned zasweq Apr 6, 2024
@ash2k
Copy link
Contributor

ash2k commented Apr 8, 2024

Hi all. So the proposal only talks about metrics. What about tracing? Would be nice to have both as an officially maintained package. Thank you!

@zasweq
Copy link
Contributor Author

zasweq commented Apr 8, 2024

The gRPC OpenTelemetry Tracing gRFC is in flight: grpc/proposal#389. Eventually, the plan is to add tracing as part of this same module, as part of the client and server stats handler. This is like our OpenCensus instrumentation component, which does both: https://github.com/grpc/grpc-go/blob/master/stats/opencensus/opencensus.go. :)

@ash2k
Copy link
Contributor

ash2k commented Apr 8, 2024

This is great! Thank you for the hard work!

@ash2k
Copy link
Contributor

ash2k commented Apr 8, 2024

One more question: is the plan to replace https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrumentation/google.golang.org/grpc/otelgrpc? Two implementations would not benefit anyone, clearly.

@dfawley dfawley assigned zasweq and unassigned yashykt and dfawley Apr 15, 2024
@zasweq zasweq assigned dfawley and unassigned zasweq Apr 15, 2024

var joinDialOptions = internal.JoinDialOptions.(func(...grpc.DialOption) grpc.DialOption)

// MetricName is a name of a metric.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: MetricName is an identifier for a metric provided by this package. or something?

Also peeking at the godoc might help you see things from your users' eyes:

$ ~/go/bin/godoc
# open browser at http://localhost:6060

Maybe just Metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to Metric. Switched docstring.

const (
// ClientAttemptStartedName is the name of the client attempt started
// metric.
ClientAttemptStartedName = MetricName("grpc.client.attempt.started")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: These will look slightly better as ClientAttemptStartedName MetricName = "grpc.client.attempt.started" (which mimics how time.Durations are declared).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yeah that looks nicer.

metrics map[MetricName]bool
}

// NewMetrics returns a Metrics with the metrics provided specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"provided specified" -> please fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't know what to say here really for this docstring, so I said "provided" which become "specified". I'll just drop the specified part and switch to "NewMetrics returns a Metrics containing the MetricName's provided".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch to "NewMetrics returns a Metrics containing the MetricName's provided".

That's fine, but note that adding 's is not how we make things plural. MetricNames please. (Or with markdown, MetricNames is preferred but godoc doesn't support that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment also talked about below, this switched to "NewMetrics returns a Metrics containing Metrics".

const (
// ClientAttemptStartedName is the name of the client attempt started
// metric.
ClientAttemptStartedName = MetricName("grpc.client.attempt.started")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Name// please. Note that several of your defined metrics don't have the Name suffix already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 205 to 206
// ClientAttemptStartedName is the name of the client attempt started
// metric.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saying "BlahBlahBlah is the name of the blah blah blah metric" which gives no information, specify what the metric actually measures, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Pulled Yashes consice description strings to these docstrings. @yashykt you should add these description strings to gRFC to officially spec them :).

Comment on lines 109 to 111
// Metrics are the metrics to instrument. Will turn on the corresponding
// metric supported by the client and server instrumentation components if
// applicable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Will turn on ________" mean?

Also: "If not set, the default metrics will be recorded"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added If not set, the default metrics will be recorded. Switched language to "will create instrument and record telemetry".

// Metrics are the metrics to instrument. Will turn on the corresponding
// metric supported by the client and server instrumentation components if
// applicable.
Metrics *Metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user sets an unsupported metric? (E.g. if they specify server metrics when they initialize the client)? Ideally DialOption would error??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembering discussing this with Yash, and I think the decision was just to leave this as a no-op in this scenario (I.e. just ignore the metrics). Right @yashykt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, no-op is what i'm thinking. Consider the case of experimental metrics. There might be a case where we decide to remove experimental metrics and we don't want removal of metrics to break user code (if they had enabled the experimental metrics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good to me. Let me know what you think Doug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, consider the case where they're initializing a DialOption and accidentally passing the default server metrics object. I could easily see this happening, as all the types line up and everything. In this scenario, they'll get no metrics at all.

If we remove an experimental metric, maybe the user wants an init-time error so they know it was removed?

And in our case, that's what will happen anyway, unless we leave the const behind for the removed metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're calling the metrics in A66 stable, which is why we have default + exported consts. I think the goal was the defaults now are stable (i.e. ones defined in A66).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a user set Server Metrics for Dial Option :)? I guess the tradeoff is the scenario you mentioned earlier vs. being defensive now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By accident. Spot the bug:

opentelemetry.DialOption(opentelemetry.Options{
	LoggingOptions: ...,
	MetricsOptions: opentelemetry.MetricsOptions{
		MeterProvider: blahblahblah,
		Metrics: opentelemetry.DefaultServerMetrics,
		TargetAttributeFilter: func(t target) bool {
			return allowedTargets[t]
		},
	},
	TracingOptions: ...,
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's separate the metrics field to client and server metrics then for type safety? not entirely type safe but at least the user knows whether setting for client or sever and not tied to DialOption/ServerOption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline; decided to make it DefaultMetrics that contains both client and server, which will be ignored on the other side to the initialized option.

stats/opentelemetry/opentelemetry.go Show resolved Hide resolved
Comment on lines 229 to 232
// Users of this component should use these bucket boundaries as part of their
// SDK MeterProvider passed in. This component sends this as "advice" to the
// API, which works, however this stability is not guaranteed, so for safety the
// SDK Meter Provider should set these bounds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this does show up in godoc. Please check out the godoc view.

Also, please remove duplication. We don't want the same "Users of this component should..." message appearing 3x.

Comment on lines 42 to 43
// EmptyMetrics represents no metrics. To start from a clean slate if want to
// pick a subset of metrics, use this and add onto it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping - it will be nice to show examples of initializing: with metrics disabled, with 1-2 metrics enabled, with the default set of metrics enabled, and with 1-2 metrics disabled.

@dfawley dfawley assigned zasweq and unassigned dfawley Apr 16, 2024
@zasweq zasweq assigned dfawley and unassigned zasweq Apr 16, 2024
Comment on lines 48 to 52
func ExampleMetric_Remove() {
// Use DefaultClientMetrics without ClientAttemptDuration and
// ClientCallDurationName.
DefaultClientMetrics.Remove(ClientAttemptDuration, ClientCallDurationName)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these would be more useful on the DialOption / ServerOption, or Options, and should show how to build the DialOption. Just one line of code isn't super useful.

E.g. something like:

func ExampleDialOption_ExcludeMetrics() {
	// To exclude specific metrics, initialize Options as follows:
	otelOpts := opentelemetry.Option{
		MetricsOptions: MetricsOptions{
			Metrics: DefaultClientMetrics.Remove(opentelemetry.ClientAttemptDuration,
		},
	}
	dialOption := opentelemetry.DialOption(otelOpts)
	cc, _ := grpc.Dial("<target string>",  dialOption)
	// Handle err.
	defer cc.Close()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't the examples technically have to be linked to a single exported function from the package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are many ways of attaching examples at different places in the package. Check the testing documentation for more details: https://pkg.go.dev/testing#hdr-Examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added numerous examples.

// NewMetrics returns a Metrics with the metrics provided specified.
func NewMetrics(metrics ...MetricName) *Metrics {
newMetrics := make(map[MetricName]bool)
// NewMetrics returns a Metrics containing the Metric's provided.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another choice is to say "containing metrics." - directly referencing the parameter's name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, switched.

// DefaultLatencyBounds are the default bounds for latency metrics. Users of
// this component should set these bucket boundaries as part of their SDK
// MeterProvider passed in for desired latency metrics.
// DefaultLatencyBounds are the default bounds for latency metrics.
DefaultLatencyBounds = []float64{0, 0.00001, 0.00005, 0.0001, 0.0003, 0.0006, 0.0008, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.008, 0.01, 0.013, 0.016, 0.02, 0.025, 0.03, 0.04, 0.05, 0.065, 0.08, 0.1, 0.13, 0.16, 0.2, 0.25, 0.3, 0.4, 0.5, 0.65, 0.8, 1, 2, 5, 10, 20, 50, 100} // provide "advice" through API, SDK should set this too
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to show users how to do this, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as part of example. Verified it worked by playing around with example configuration wrt bounds and seeing the result.

ClientAttemptRcvdCompressedTotalMessageSize Metric = "grpc.client.attempt.rcvd_total_compressed_message_size"
// ClientCallDurationName is the time taken by gRPC to complete an RPC from
// application's perspective.
ClientCallDurationName Metric = "grpc.client.call.duration"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is still named Name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, whoops. Removed.

@dfawley dfawley assigned zasweq and unassigned dfawley Apr 17, 2024
@zasweq
Copy link
Contributor Author

zasweq commented Apr 20, 2024

Sorry for the delay on the example, it's pretty complicated. I need to set bounds and instruments, and it also seems like it needs to output something so should I set up a whole RPC system? That's the only way the example can really "do" something. Showing users how to use default bounds links in MeterProvider SDK, and a reader which to show how to use I need to do a full RPC. And then each knob you want to show how it works (i.e. add/delete metrics is a whole other contained example that needs to spin up a server and do an rpc etc.). Is the e2e test with Defaults explicitly set and a manual reader set/read from an asserted on a good enough example? I'm fine adding an example, but to have it do anything (esp with default bounds) feels incredibly verbose compared to the examples I've seen. If you think it's not too verbose, what would be scope of your example? Can you review the e2e tests while you're at it while I'm writing example?

@zasweq zasweq assigned dfawley and unassigned zasweq Apr 20, 2024
@dfawley
Copy link
Member

dfawley commented Apr 22, 2024

The example doesn't have to be a complete system. Just show the code users need to do a thing.

An example showing how users set the bounds on the meter should be a couple function calls in theory? They don't need to actually do anything.

@zasweq zasweq removed their assignment Apr 23, 2024
@zasweq
Copy link
Contributor Author

zasweq commented Apr 23, 2024

Added numerous examples and got to all comments from last pass. Examples and tests are ready to be looked at.

@dfawley
Copy link
Member

dfawley commented Apr 23, 2024

Sorry, can you create a new PR for this and squash your commits? There are 180+ comments and github is getting bogged down. Thanks!

@dfawley dfawley closed this Apr 23, 2024
@zasweq
Copy link
Contributor Author

zasweq commented Apr 23, 2024

Done in #7166.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants