Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pubsub): add otel tracing support with links #9594

Draft
wants to merge 50 commits into
base: pubsub-otel-trace
Choose a base branch
from

Conversation

hongalex
Copy link
Member

@hongalex hongalex commented Mar 15, 2024

This replace #8592 and adds the missing subscribe side spans + updates the publish side spans to match our most recent design

  • Use links instead of parent/child spans for batch RPC operations
  • Align spans names with most recent OpenTelemetry messaging semantic conventions
  • Align attributes to most recent messaging semconv
  • Add subscribe side integration tests
  • Fix integration tests to use span ID generator
  • Integrate the newly added AddLinks API to get bidirectional links

Note: this PR uses an experimental golang.org/x/exp/slices package, which was added in go 1.21. This is only used for testing.

…/google-cloud-go into pubsub-otel-trace-receive-clean
@hongalex hongalex marked this pull request as draft March 15, 2024 21:40
Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of initial comments. I haven't gone too deep as I'm not super familiar here.

pubsub/iterator.go Outdated Show resolved Hide resolved
pubsub/iterator.go Outdated Show resolved Hide resolved
pubsub/iterator.go Outdated Show resolved Hide resolved
pubsub/iterator.go Outdated Show resolved Hide resolved
pubsub/iterator.go Outdated Show resolved Hide resolved
pubsub/iterator.go Outdated Show resolved Hide resolved
if len(toRetry) > 0 {
// Retry modacks/nacks in a separate goroutine.
go func() {
it.retryModAcks(toRetry, deadlineSec, logOnInvalid)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retryModAcks aren't traced, should they be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opting to not trace them currently. These retries are specific to exactly once delivery cases, and can be visualized by the multiple ModifyAckDeadline RPCs generated by the underlying library. If there's sufficient demand for this though, we'll consider adding later in a future PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense!

pubsub/subscription.go Outdated Show resolved Hide resolved
@hongalex
Copy link
Member Author

The recent changes from fixing the memory leaks now can result in the application crashing because of a nil panic, which happens when a message is acked but already expired and removed from the activeSpans map. Will need to cut a release since the previous release is unstable.

@VladimirYushkevich
Copy link

@hongalex , thanks a lot for adding trace support for pubsub. I tried cloud.google.com/go/pubsub@v1.37.0-beta.otel.trace2 and have question related to span with publishMessageBundle code.function (or {topicID} publish name). For some reason it has no parent. It looks wired because code.function: Publish (or {topicID} create name) is correctly assigned to my manually created span. Bellow I provide a screenshots of my issue:

trace=6835bec8eafcfa19e76a02286d4b1f90 my manually created span with {topicID} create as a child span:

Screenshot 2024-05-07 at 19 21 32
trace=777db2f85da93d8cd5dd09c917ffa9b1 {topicID} publish span without parent(only with auto instrumented google.pubsub.v1.Publisher/Publish child):

Screenshot 2024-05-07 at 19 28 04

I would expect to see google.pubsub.v1.Publisher/Publish as a child for trace=6835bec8eafcfa19e76a02286d4b1f90 before or after publisher batching span.

@hongalex
Copy link
Member Author

@VladimirYushkevich Yeah so that is intentional. The reason is because the create span is only associated with a single message, whereas the publish span is a batched span (many messages are batched into a single publish RPC). Tracing requires each span to only have a single parent, so it's not currently possible to have the publish span be considered in multiple message traces. Thus, it doesn't have a parent. I will be adding support for SpanLinks though, where the create span will have a Link to the publish span.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants