You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We're hoping to migrate to gRPC's Otel metrics when instrumentation is complete in the languages we support. In the meantime, we have custom internal metrics designed to match the otel specs (grpc/proposal#380). We've noticed several friction points with these metrics internally, and we're hoping we can work with you folks to build solutions into gRPC for two reasons: 1. if these are problems we're having internally, we imagine other gRPC customers are having similar issues, and 2. if we cant fix these upstream, it's likely we can't adopt the metrics you folks are building out for us.
Describe the solution you'd like (and alternatives you've considered)
There are 3 main solutions we're looking for here
The ability to add custom static tags at startup. This is useful for information that will not change for the life of a program. For example, grpc currently has a grpc.target tag, whose value is the fully qualified target. We have several customers that would like the ability to add a target_service tag with just the service name, as it eases the process of migrating existing dashboards and prevents them from having to use wildcards to find all services speaking to them. Another example use case we have for this is a service that has multiple teams contributing endpoints; they currently allow teams to break endpoints into groups by a tag so that they can be monitored together without manually updating new endpoints (i.e, grouping endpoints into distinct SLO categories)
The ability to identify the caller in the server metrics via custom metadata send by clients. This is incredibly useful for debugging services. We did briefly discuss this with the team last September, and we've since been unable to come up with another strategy for this.
The ability to configure non error codes. These metrics now treat any non-OK response as an error, but sometimes services make decisions as to what they consider an "error" (for example, many of our services consider "Canceled" to be a non-error code as it is usually a decision made by the caller, not a server error). This request is a little less urgent, since we can currently filter these out in our graphs and calculations like Success Rate. It's just a nice to have.
We recognize that there is a risk of cardinality with allowing customers to define custom tags, but we're hopeful we can work with you folks on a strategy that is still safe without sacrificing the usefulness of the metrics you're building.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
We're hoping to migrate to gRPC's Otel metrics when instrumentation is complete in the languages we support. In the meantime, we have custom internal metrics designed to match the otel specs (grpc/proposal#380). We've noticed several friction points with these metrics internally, and we're hoping we can work with you folks to build solutions into gRPC for two reasons: 1. if these are problems we're having internally, we imagine other gRPC customers are having similar issues, and 2. if we cant fix these upstream, it's likely we can't adopt the metrics you folks are building out for us.
Describe the solution you'd like (and alternatives you've considered)
There are 3 main solutions we're looking for here
grpc.target
tag, whose value is the fully qualified target. We have several customers that would like the ability to add atarget_service
tag with just the service name, as it eases the process of migrating existing dashboards and prevents them from having to use wildcards to find all services speaking to them. Another example use case we have for this is a service that has multiple teams contributing endpoints; they currently allow teams to break endpoints into groups by a tag so that they can be monitored together without manually updating new endpoints (i.e, grouping endpoints into distinct SLO categories)We recognize that there is a risk of cardinality with allowing customers to define custom tags, but we're hopeful we can work with you folks on a strategy that is still safe without sacrificing the usefulness of the metrics you're building.
The text was updated successfully, but these errors were encountered: