Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opentelemetry support #2152

Merged
merged 6 commits into from Jun 16, 2021
Merged

Add opentelemetry support #2152

merged 6 commits into from Jun 16, 2021

Conversation

tonistiigi
Copy link
Member

Opentelemetry has superseded opentracing. The API is quite similar but additional collectors endpoints should allow forwarding traces from client or container in the future.

JAEGER_TRACE env is supported like before. As well as the environment variables defined in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md . OTEL collector is also supported both via grpc and HTTP protos.

Initial commit only added opentelemetry exporters and used the opentracing bridge for tracing. https://pkg.go.dev/go.opentelemetry.io/otel/bridge/opentracing . This didn't work for cross-process grpc though as the bridge only supports HTTP propagators. So I switched all tracers directly to opentelemetry and removed opentracing(+helper) dependencies completely. In some cases, the helper methods I had previously written are actually very similar to methods opentelemetry provides. Cross-process tracing will not work when one side uses the older opentracing API.

I have a list of PRs that I want to propose upstream so that some parts of this can be cleaner. Functionality-wise, this PR should be complete.

As before, using a global tracer or propagator is a forbidden pattern in buildkit codebase.

github.com/hashicorp/go-immutable-radix => github.com/tonistiigi/go-immutable-radix v0.0.0-20170803185627-826af9ccf0fe
github.com/jaguilar/vt100 => github.com/tonistiigi/vt100 v0.0.0-20190402012908-ad4c4a574305
// genproto: corresponds to containerd
Copy link
Member Author

@tonistiigi tonistiigi Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required. If this really breaks containerd worker I'm willing to drop support for it until containerd gets updated. Too many things are blocked on this and this is a too high-priority area to be permanently stuck on a 2-year old library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI seem green, so this is fine, I guess

@tonistiigi
Copy link
Member Author

@AkihiroSuda @chris-crone

@chris-crone
Copy link
Collaborator

Nice! Thanks @tonistiigi !

@AkihiroSuda
Copy link
Member

needs rebase

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@Aneurysm9 Aneurysm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the issues you've created upstream as a result of this implementation experience. I've left some comments after a quick review, but overall this looks like a simple and clean conversion from OpenTracing, which is really good to see. Do note that we're about to release v1.0.0-RC1 which will contain some changes that will impact this implementation. If you're able to join our SIG meetings on Thursdays we'd love to hear from you about your experience using the API/SDK.

Comment on lines +195 to +196
func (wt *withTracer) Tracer(instrumentationName string, opts ...trace.TracerOption) trace.Tracer {
return wt.tracer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a concrete Tracer here, I'd suggest having a TracerProvider as this is dropping information coming from the instrumentation that should be recorded on spans created by the returned Tracer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used Tracer was because there was no way to get access to the current traceprovider from the context.

client/solve.go Outdated Show resolved Hide resolved
@@ -64,7 +64,7 @@ func ResolveClient(c *cli.Context) (*client.Client, error) {

ctx := CommandContext(c)

if span := opentracing.SpanFromContext(ctx); span != nil {
if span := trace.SpanFromContext(ctx); span != nil {
opts = append(opts, client.WithTracer(span.Tracer()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span.Tracer() method has been removed and will not be present starting with v1.0.0-RC1. It is also less safe with OpenTelemetry to re-use a Tracer from an incoming Span as Tracer can contain information related to different instrumentation. Instead, any instrumentation that needs a Tracer should accept a TracerProvider and use that to construct a Tracer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span.Tracer() method has been removed and will not be present starting with

That is very surprising to me. Having access to the current tracer to create child spans inside the caller's span has been a key feature to me, as it was in opentracing to add flexible instrumentation to the packages. With this removed, I don't see a logical way how tracing would be implemented in a project like BuildKit. I do not want to use global tracer. It is a code smell and makes it so that the packages make assumptions of how tracing is organized in the final binary. If project depending on Buildkit wants to set up a global tracer, everything will still work. BuildKit's packages should be reusable on their own; in many cases, different components of it are included in other tools. The packages do not make assumptions that they only work as singletons or that you can't have two instances of an object using different tracers. I also don't want every constructor/pkg to contain code on how to initialize the tracer and carry that information. That would be a big maintenance cost with no additional value. Sometimes in BuildKit a context goes through 10+ layers of components.

Letting functions to ignore tracing if none is configured and add extra contextualized child spans if the caller code set up a tracer/parent-span seems a very logical way to manage this. Even if a project is very complicated, with different components and tracing points, the root context usually starts from a very specific location(eg. gprc client).

I looked at the issues open-telemetry/opentelemetry-go#1892 open-telemetry/opentelemetry-go#1887 and to me, if this behavior is not expected, it is an issue with the delegation of the global tracer. I saw some delegation in the global package and thought this was what it was for already. Above else, I think it shows how bad pattern using a global tracer is for synchronizing individual components.

Why would you even want to take parent span from context but use a different tracer. Doesn't it automatically mean that your data is messed up and you can't get a proper report back?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you even want to take parent span from context but use a different tracer. Doesn't it automatically mean that your data is messed up and you can't get a proper report back?

The Tracer carries information regarding the instrumenting library. This information includes the name of the instrumenting library and optionally its version and the schema identifier for attributes produced by the instrumentation. In this sense it is different from the OpenTracing tracer which, to my understanding, carried the export pipeline configuration that in OpenTelemetry is carried by the TracerProvider.

With the Tracer carrying this additional information it is not safe to take a Tracer from a parent span and use it directly. It may identify a different instrumenting library and provide incorrect information regarding the attribute schema.

It seems that what you're looking for is the ability to extract a TracerProvider from a Span, from which a Tracer scoped to the currently-instrumenting library could be created. We'd love to talk more about these use cases at our next SIG meeting, though I expect providing that ability would need to go through the OTEP process before being included in a new specification version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a different implementation of Span/Tracer than the sdk/trace? It is confusing to me why you are ok with these different implementations to share same spancontext and automatically link to it while they contain the traceid/spanid of something that you claim could be part of the wrong trace. If you want these implementations to use same spancontext key but provide a way to possibly detect conflicts between implementations, add a validation method for it. Also, isn't this the whole point of using these interfaces that I do not target a specific implementation, and my trace-points will always work, whatever tracer is configured by the caller. If my code only works with a specific implementation of tracer I would need to validate it anyway.

To have reusable packages, I need a way for my library to just define the tracing points. The library should not care how the actual tracing infrastructure is configured, if two instances of the same object use the same tracer or not etc. This is all up to the caller. Also, every constructor/function should not take and pass tracer objects around. They can of course choose to optionally take them if they want to provide override points, but the default path should just work. With this change, you are not providing a way to achieve that(without me defining my own wrapper that I can continue to pass with context, meaning packages will be incompatible with non-buildkit callers).

It seems that what you're looking for is the ability to extract a TracerProvider from a Span, from which a Tracer scoped to the currently-instrumenting library could be created

That might technically work, but if it means a traceid per spanid it makes no logical sense. I just need a way to pass tracing context through my components. You do it half-way by providing a spancontext but force me to use a global variable for the other half of what StartSpan needs. That's not an option if you spent a lot of time designing your components so that they would not use the global state and have such limitations. Another thing I don't get is that getting access to TraceProvider is actually dangerous. Eg. my library should not have a possibility to add span processors or shut it down. My library should only have the capability to create additional spans in the context that caller provides and that is exactly the only thing enabled by the Tracer interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have reusable packages,

An example of a package that I claim is quite useless atm. https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http#NewTransport . This is supposed to be a super reusable package, for different services, implementations of roundtrip etc. It takes the Tracer(Provider) on NewTransport that is the context that has nothing to do with tracing. This is a background pool with its own lifecycle, the init code is nowhere near the place users would make requests, somewhere there is a cleanup routine to clean up old connections etc. None of it has anything to do with request tracing. The actual tracing configuration comes in with the request to RoundTrip and is passed to the next RoundTrip implementation. This works perfectly fine for maintaining the relationship of the spans that are managed by context passed to RoundTrip. But because it doesn't read the tracer itself from context and that is taken with a side-channel there is no actual other way to use this component than to use a global tracer, meaning this component is not reusable at all and spans passed to RoundTrip might be from a completely different tracer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you available tomorrow for a conversation with @MrAlias and I? I fear that we may be talking past each other or not operating from consistent assumptions and understandings and it may be faster to resolve that with synchronous communication. It is very important to me to understand your concerns and whether addressing them may require changes to the public API we are looking to stabilize in the near term.

cmd/buildctl/common/trace.go Outdated Show resolved Hide resolved
Comment on lines 45 to +47
if span != nil {
ext.Error.Set(span, true)
span.SetStatus(codes.Error, err.Error())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span will always be non-nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except in here it will be nil if the Before callback never gets called.

cmd/buildctl/common/trace.go Show resolved Hide resolved
util/tracing/tracing.go Outdated Show resolved Hide resolved
@tonistiigi
Copy link
Member Author

@Aneurysm9

Thanks for the pointers.

The Tracer() change looks very problematic to me. More on the inline comment.

I have some more interesting things in #2163

The issues I hit:

#2163 also has some packages for automatically detecting current trace environment from env and passing it. I think these packages are quite generic and would hope most projects would do something similar. But I didn't find any existing ones. I'd be happy to contribute them into some other project if there is interest.

I'm ok to join SIG to discuss any of this further.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This can be done as a separate change when needed.
Also should analyze if this would affect the gogo
incompatibility issues with newer proto.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Upstream fixes needed for cleaner solution

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@@ -22,7 +22,7 @@ func jaegerExporter() (sdktrace.SpanExporter, error) {

endpoint := envOr("OTEL_EXPORTER_JAEGER_ENDPOINT", "http://localhost:14250")
host := envOr("OTEL_EXPORTER_JAEGER_HOST", "localhost")
port := envOr("OTEL_EXPORTER_JAEGER_PORT", "6832")
port := envOr("OTEL_EXPORTER_JAEGER_PORT", "6831")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. iirc one is for compress and the other uncompress. I've tested that 6831 works by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants