Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vendor: OTEL v1.19.0 / v0.45.0, containerd v1.7.11 #46830

Merged
merged 6 commits into from Dec 12, 2023

Conversation

thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Nov 19, 2023


vendor: vendor: upgrade OpenTelemetry to v1.19.0 / v0.45.0

Upgrade to the latest OpenTelemetry libraries; this will unblock a lot of
downstream projects in the ecosystem to upgrade, as some of the parts here
were pre-1.0/unstable.

vendor: github.com/containerd/containerd v1.7.9

full diff: containerd/containerd@v1.7.8...v1.7.9

vendor: github.com/containerd/containerd v1.7.10

full diff: containerd/containerd@v1.7.9...v1.7.10

vendor: github.com/containerd/containerd v1.7.11

full diff: containerd/containerd@v1.7.10...v1.7.11

vendor: github.com/moby/buildkit v0.12.5-0.20231208203051-3b6880d2a00f

full diff: moby/buildkit@v0.12.4...3b6880d

update to go.opentelemetry.io/otel/semconv/v1.21.0, remove "httpconv" uses

This commit switches our code to use semconv 1.21, which is the version matching
the OTEL modules, as well as the containerd code.

The BuildKit 0.12.x module currently uses an older version of the OTEL modules,
and uses the semconv 0.17 schema. Mixing schema-versions is problematic, but
we still want to consume BuildKit's "detect" package to wire-up other parts
of OTEL.

To align the versions in our code, this patch sets the BuildKit detect.Resource
with the correct semconv version.

It's worth noting that the BuildKit package has a custom "serviceNameDetector";
https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L153-L169

Whith is merged with OTEL's default resource:
https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L100-L107

There's no need to duplicate that code, as OTEL's resource.Default() already
provides this functionality:

This patch also removes uses of the httpconv package, which is no longer included
in semconv 1.21 and now an internal package. Removing the use of this package
means that hijacked connections will not have the HTTP attributes on the Moby
client span, which isn't ideal, but a limited loss that'd impact exec/attach.
The span itself will still exist, it just won't the additional attributes that
are added by that package.

Alternatively, the httpconv call COULD remain - it will not error and will send
syntactically valid spans but we would be mixing & matching semconv versions,
so won't be compliant.

Some parts of the httpconv package were preserved through a very minimal local
implementation; a variant of httpconv.ClientStatus(resp.StatusCode)) is added
to set the span status (span.SetStatus()). The httpconv package has complex
logic for this, but mostly drills down to HTTP status range (1xx/2xx/3xx/4xx/5xx)
to determine if the status was successfull or non-successful (4xx/5xx).

The additional logic it provided was to validate actual status-codes, and to
convert "bogus" status codes in "success" ranges (1xx, 2xx) into an error. That
code seemed over-reaching (and not accounting for potential future valid
status codes). Let's assume we only get valid status codes.

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah
Copy link
Member Author

./cc @milas

@thaJeztah
Copy link
Member Author

thaJeztah commented Nov 19, 2023

⚠️ Hm... looks like we get a panic coming from OTEL in one of our tests;

2023/11/19 12:37:39 http: panic serving 127.0.0.1:59500: runtime error: invalid memory address or nil pointer dereference
...
2023/11/19 12:37:40 http: panic serving 127.0.0.1:59502: could not unmarshal json for /AuthZPlugin.AuthZRes: unexpected end of JSON input
=== RUN   TestAuthZPluginAllowEventStream
2023/11/19 12:37:39 http: panic serving 127.0.0.1:59500: runtime error: invalid memory address or nil pointer dereference
goroutine 133 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1868 +0xb9
panic({0xf05620?, 0x1a46a40?})
	/usr/local/go/src/runtime/panic.go:920 +0x270
github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End.func1()
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace/span.go:383 +0x25
github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc0004a6900, {0x0, 0x0, 0xa?})
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace/span.go:421 +0x9fb
panic({0xf05620?, 0x1a46a40?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/docker/docker/integration/plugin/authz.setupSuite.func3({0x1205498, 0xc00031c540}, 0x7f78f2b6ffa0?)
	/go/src/github.com/docker/docker/integration/plugin/authz/main_test.go:159 +0x1ad
net/http.HandlerFunc.ServeHTTP(0xc000100000?, {0x1205498?, 0xc00031c540?}, 0xc0003b74f8?)
	/usr/local/go/src/net/http/server.go:2136 +0x29
net/http.(*ServeMux).ServeHTTP(0x120b610?, {0x1205498, 0xc00031c540}, 0xc00047a900)
	/usr/local/go/src/net/http/server.go:2514 +0x142
github.com/docker/docker/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP(0xc000308240, {0x1204088?, 0xc0001be0e0}, 0xc00047a700)
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/handler.go:212 +0x1204
net/http.serverHandler.ServeHTTP({0x1200d08?}, {0x1204088?, 0xc0001be0e0?}, 0x6?)
	/usr/local/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc0004fad80, {0x120b610, 0xc000238de0})
	/usr/local/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 49
	/usr/local/go/src/net/http/server.go:3086 +0x5cb
2023/11/19 12:37:40 http: panic serving 127.0.0.1:59502: could not unmarshal json for /AuthZPlugin.AuthZRes: unexpected end of JSON input
goroutine 92 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1868 +0xb9
panic({0xe9b520?, 0xc00011e150?})
	/usr/local/go/src/runtime/panic.go:920 +0x270
github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End.func1()
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace/span.go:383 +0x25
github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc00047e180, {0x0, 0x0, 0x50?})
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace/span.go:421 +0x9fb
panic({0xe9b520?, 0xc00011e150?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/docker/docker/integration/plugin/authz.setupSuite.func3({0x1205498, 0xc000088a80}, 0x7f79398e3fa0?)
	/go/src/github.com/docker/docker/integration/plugin/authz/main_test.go:149 +0x3fb
net/http.HandlerFunc.ServeHTTP(0xc00002e000?, {0x1205498?, 0xc000088a80?}, 0xc0003b94f8?)
	/usr/local/go/src/net/http/server.go:2136 +0x29
net/http.(*ServeMux).ServeHTTP(0x120b610?, {0x1205498, 0xc000088a80}, 0xc000442200)
	/usr/local/go/src/net/http/server.go:2514 +0x142
github.com/docker/docker/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP(0xc000308240, {0x1204088?, 0xc00025c000}, 0xc000442000)
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/handler.go:212 +0x1204
net/http.serverHandler.ServeHTTP({0xc00050a120?}, {0x1204088?, 0xc00025c000?}, 0x6?)
	/usr/local/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc0004fa000, {0x120b610, 0xc000238de0})
	/usr/local/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 49
	/usr/local/go/src/net/http/server.go:3086 +0x5cb
2023/11/19 12:37:42 http: panic serving 127.0.0.1:50354: could not unmarshal json for /AuthZPlugin.AuthZRes: unexpected end of JSON input
goroutine 209 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1868 +0xb9
panic({0xe9b520?, 0xc00040e230?})
	/usr/local/go/src/runtime/panic.go:920 +0x270
github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End.func1()
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace/span.go:383 +0x25
github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc00051a180, {0x0, 0x0, 0x50?})
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/otel/sdk/trace/span.go:421 +0x9fb
panic({0xe9b520?, 0xc00040e230?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/docker/docker/integration/plugin/authz.setupSuite.func3({0x1205498, 0xc00054bc20}, 0x7f79398e3fa0?)
	/go/src/github.com/docker/docker/integration/plugin/authz/main_test.go:149 +0x3fb
net/http.HandlerFunc.ServeHTTP(0xc0003ad800?, {0x1205498?, 0xc00054bc20?}, 0xc0003b54f8?)
	/usr/local/go/src/net/http/server.go:2136 +0x29
net/http.(*ServeMux).ServeHTTP(0x120b610?, {0x1205498, 0xc00054bc20}, 0xc000442d00)
	/usr/local/go/src/net/http/server.go:2514 +0x142
github.com/docker/docker/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP(0xc000308240, {0x1204088?, 0xc00025c2a0}, 0xc000442b00)
	/go/src/github.com/docker/docker/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/handler.go:212 +0x1204
net/http.serverHandler.ServeHTTP({0xc000284030?}, {0x1204088?, 0xc00025c2a0?}, 0x6?)
	/usr/local/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc00040c480, {0x120b610, 0xc000238de0})
	/usr/local/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 49
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

@thaJeztah
Copy link
Member Author

⚠️ Also a failure in a buildkit-related test on ubuntu 20.04 rootless;

=== FAIL: amd64.integration.build TestBuildkitHistoryTracePropagation (30.04s)
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] Channel created" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] original dial target is: \"unix:///run/buildkit/buildkitd.sock\"" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] parsed dial target is: {URL:{Scheme:unix Opaque: User: Host: Path:/run/buildkit/buildkitd.sock RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] Channel authority set to \"localhost\"" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] Resolver state updated: {\n  \"Addresses\": [\n    {\n      \"Addr\": \"/run/buildkit/buildkitd.sock\",\n      \"ServerName\": \"\",\n      \"Attributes\": {\n        \"\\u003c%!p(networktype.keyType=grpc.internal.transport.networktype)\\u003e\": \"unix\"\n      },\n      \"BalancerAttributes\": null,\n      \"Metadata\": null\n    }\n  ],\n  \"Endpoints\": [\n    {\n      \"Addresses\": [\n        {\n          \"Addr\": \"/run/buildkit/buildkitd.sock\",\n          \"ServerName\": \"\",\n          \"Attributes\": {\n            \"\\u003c%!p(networktype.keyType=grpc.internal.transport.networktype)\\u003e\": \"unix\"\n          },\n          \"BalancerAttributes\": null,\n          \"Metadata\": null\n        }\n      ],\n      \"Attributes\": null\n    }\n  ],\n  \"ServiceConfig\": null,\n  \"Attributes\": null\n} (resolver returned new addresses)" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] Channel switches to new LB policy \"pick_first\"" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1 SubChannel #2] Subchannel created" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] Channel Connectivity change to CONNECTING" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1 SubChannel #2] Subchannel picks a new address \"/run/buildkit/buildkitd.sock\" to connect" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Channel #1] Channel Connectivity change to READY" module=grpc
time="2023-11-19T12:32:43Z" level=info msg="[core] [Server #4] Server created" module=grpc
    build_traces_test.go:84: Waiting for trace to be available
    build_traces_test.go:85: timeout hit after 30s: trace not available yet
time="2023-11-19T12:33:13Z" level=info msg="[core] [Channel #1] Channel Connectivity change to SHUTDOWN" module=grpc
time="2023-11-19T12:33:13Z" level=info msg="[core] [Channel #1] ccBalancerWrapper: closing" module=grpc
time="2023-11-19T12:33:13Z" level=info msg="[core] [Channel #1] Closing the name resolver" module=grpc
time="2023-11-19T12:33:13Z" level=info msg="[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN" module=grpc
time="2023-11-19T12:33:13Z" level=info msg="[core] [Channel #1 SubChannel #2] Subchannel deleted" module=grpc
time="2023-11-19T12:33:13Z" level=info msg="[core] [Channel #1] Channel deleted" module=grpc

@thaJeztah
Copy link
Member Author

Wondering if the TestBuildkitHistoryTracePropagation means we need changes in BuildKit v0.12 to make it work? /cc @tonistiigi @crazy-max

@thaJeztah thaJeztah changed the title vendor: hcsshim v0.11.4, OTEL v1.19.0 / v0.45.0, containerd v1.7.9 vendor: OTEL v1.19.0 / v0.45.0, containerd v1.7.9 Nov 20, 2023
@cpuguy83
Copy link
Member

cpuguy83 commented Nov 20, 2023

Here's the issue (from daemon logs):

Failed to detect trace exporter for buildkit controller" error="cannot merge resource due to conflicting Schema URL

@thaJeztah
Copy link
Member Author

@thaJeztah
Copy link
Member Author

Discussing in our call; it looks like this may be related to the semconv package, and it looks like we have multiple versions in use in our codebase;

containerd and moby are on v1.17.0, otel switched to v1.21.0

client/hijack.go:	"go.opentelemetry.io/otel/semconv/v1.17.0/httpconv"
testutil/helpers.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/github.com/containerd/containerd/tracing/tracing.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/github.com/containerd/containerd/tracing/tracing.go:	httpconv "go.opentelemetry.io/otel/semconv/v1.17.0/httpconv"
vendor/github.com/moby/buildkit/util/tracing/detect/detect.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/github.com/moby/buildkit/util/tracing/tracing.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/config.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/interceptor.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/internal/parse.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/semconv.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc/stats_handler.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace/httptrace.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace/httptrace.go:	"go.opentelemetry.io/otel/semconv/v1.17.0/httpconv"
vendor/go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace/httptrace.go:	"go.opentelemetry.io/otel/semconv/v1.17.0/netconv"
vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/handler.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/handler.go:	"go.opentelemetry.io/otel/semconv/v1.17.0/httpconv"
vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/transport.go:	"go.opentelemetry.io/otel/semconv/v1.17.0/httpconv"
vendor/go.opentelemetry.io/otel/sdk/resource/builtin.go:	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
vendor/go.opentelemetry.io/otel/sdk/resource/container.go:	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
vendor/go.opentelemetry.io/otel/sdk/resource/env.go:	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
vendor/go.opentelemetry.io/otel/sdk/resource/host_id.go:	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
vendor/go.opentelemetry.io/otel/sdk/resource/os.go:	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
vendor/go.opentelemetry.io/otel/sdk/resource/process.go:	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
vendor/go.opentelemetry.io/otel/sdk/trace/span.go:	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
vendor/go.opentelemetry.io/otel/semconv/v1.17.0/httpconv/http.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
vendor/go.opentelemetry.io/otel/semconv/v1.17.0/netconv/net.go:	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"

@thaJeztah
Copy link
Member Author

thaJeztah commented Nov 20, 2023

Hmm.. so httpconv and netconv no longer exist in semconv v1.21.0

"go.opentelemetry.io/otel/semconv/v1.17.0/httpconv"
"go.opentelemetry.io/otel/semconv/v1.17.0/netconv"

from https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.17.0

Starting from v1.21.0 of semantic conventions, go.opentelemetry.io/otel/semconv/{version}/httpconv and go.opentelemetry.io/otel/semconv/{version}/netconv packages will no longer be published.

And they should be moved somewhere to "contrib";

Related;

@thaJeztah thaJeztah marked this pull request as draft November 20, 2023 19:58
@milas
Copy link
Contributor

milas commented Nov 20, 2023

Ugh, yeah, the semconv package has been a pain, and the removal of httpconv from the public surface area is particularly obnoxious.

While they're now in contrib, they're internal, so unusable.

I believe they also need to get updated in containerd 😭

https://github.com/containerd/containerd/blob/b1c6f01cec3d7d8fefa78b85d74df970b688dd0b/tracing/tracing.go#L26-L27

@thaJeztah
Copy link
Member Author

Yeah, was trying to find out what the replacement was for them. What's the use of moving them to another repository if that rep has them as internal? 🫠

@thaJeztah thaJeztah force-pushed the vendor_containerd_1.7.9 branch 2 times, most recently from c698d56 to e7f7181 Compare December 8, 2023 22:28
@@ -238,6 +240,17 @@ func (cli *DaemonCli) start(opts *daemonOptions) (err error) {

setOTLPProtoDefault()
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

// Taken from https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L100-L107
res, err := resource.Detect(context.Background(), serviceNameDetector{})
Copy link
Member Author

@thaJeztah thaJeztah Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this, or is "resource.Default()" enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like resource.Default() is sufficient:

(fyi if BuildKit also switches to using resource.Default() in a future version, that also side-steps the whole SchemaURL issue 🤔)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking! Let me push a commit to change that.

I'll keep changes as separate commit, but will squash things before merging (just keeping my own sanity a bit 😂)

if resp.StatusCode >= http.StatusBadRequest {
code = codes.Error
}
span.SetStatus(code, "")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this part if we don't have the "httpconv.ClientRequest"? (disabled that part, per suggestion above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe outside the scope of this PR, but worth exploring updating this logic to use http.Transport with a custom dialer since httputil.NewClientConn is deprecated. The otelhttp library could wrap the transport in that case and maybe simplify this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually could probably be even easier to just wrap clientconn.Do as an http.RoundTripper and could use https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#NewTransport. Not sure if that matches what the old functionality was doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing around with that idea here #46920. I'm curious if that makes this clearer or cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @dmcgowan re: refactoring the actual (non-OTel) logic here for a future release, at which point we can bring back OTel "for free" in the process.

That said, I think your simplified status handling here is fine

@thaJeztah thaJeztah changed the title vendor: OTEL v1.19.0 / v0.45.0, containerd v1.7.10 vendor: OTEL v1.19.0 / v0.45.0, containerd v1.7.11 Dec 9, 2023
Copy link
Contributor

@milas milas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'll do some light testing on this branch shortly as well to make sure it all looks good at runtime

if resp.StatusCode >= http.StatusBadRequest {
code = codes.Error
}
span.SetStatus(code, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @dmcgowan re: refactoring the actual (non-OTel) logic here for a future release, at which point we can bring back OTel "for free" in the process.

That said, I think your simplified status handling here is fine

@@ -238,6 +240,17 @@ func (cli *DaemonCli) start(opts *daemonOptions) (err error) {

setOTLPProtoDefault()
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

// Taken from https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L100-L107
res, err := resource.Detect(context.Background(), serviceNameDetector{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like resource.Default() is sufficient:

(fyi if BuildKit also switches to using resource.Default() in a future version, that also side-steps the whole SchemaURL issue 🤔)

@milas
Copy link
Contributor

milas commented Dec 11, 2023

Running a container:
docker run alpine uname

Traces from dockerd and propagation to containerd working (the containerd binary was built from docker buildx bake bin-image from this PR) ✅

Building a container:
docker buildx build for minimal FROM alpine

Traces from buildx and propagation to dockerd & embedded BuildKit working - see also the embedded containerd resolver span (remotes.docker.resovler.HTTPRequest) since I'm running with the containerd snappshotter ✅

@thaJeztah thaJeztah force-pushed the vendor_containerd_1.7.9 branch 2 times, most recently from f33f45a to 24febc6 Compare December 11, 2023 17:07
Upgrade to the latest OpenTelemetry libraries; this will unblock a lot of
downstream projects in the ecosystem to upgrade, as some of the parts here
were pre-1.0/unstable.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: containerd/containerd@v1.7.8...v1.7.9

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: containerd/containerd@v1.7.9...v1.7.10

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: containerd/containerd@v1.7.10...v1.7.11

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
… uses

This commit switches our code to use semconv 1.21, which is the version matching
the OTEL modules, as well as the containerd code.

The BuildKit 0.12.x module currently uses an older version of the OTEL modules,
and uses the semconv 0.17 schema. Mixing schema-versions is problematic, but
we still want to consume BuildKit's "detect" package to wire-up other parts
of OTEL.

To align the versions in our code, this patch sets the BuildKit detect.Resource
with the correct semconv version.

It's worth noting that the BuildKit package has a custom "serviceNameDetector";
https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L153-L169

Whith is merged with OTEL's default resource:
https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L100-L107

There's no need to duplicate that code, as OTEL's `resource.Default()` already
provides this functionality:

- It uses fromEnv{} detector internally: https://github.com/open-telemetry/opentelemetry-go/blob/v1.19.0/sdk/resource/resource.go#L208
- fromEnv{} detector reads OTEL_SERVICE_NAME: https://github.com/open-telemetry/opentelemetry-go/blob/v1.19.0/sdk/resource/env.go#L53

This patch also removes uses of the httpconv package, which is no longer included
in semconv 1.21 and now an internal package. Removing the use of this package
means that hijacked connections will not have the HTTP attributes on the Moby
client span, which isn't ideal, but a limited loss that'd impact exec/attach.
The span itself will still exist, it just won't the additional attributes that
are added by that package.

Alternatively, the httpconv call COULD remain - it will not error and will send
syntactically valid spans but we would be mixing & matching semconv versions,
so won't be compliant.

Some parts of the httpconv package were preserved through a very minimal local
implementation; a variant of `httpconv.ClientStatus(resp.StatusCode))` is added
to set the span status (`span.SetStatus()`). The `httpconv` package has complex
logic for this, but mostly drills down to HTTP status range (1xx/2xx/3xx/4xx/5xx)
to determine if the status was successfull or non-successful (4xx/5xx).

The additional logic it provided was to validate actual status-codes, and to
convert "bogus" status codes in "success" ranges (1xx, 2xx) into an error. That
code seemed over-reaching (and not accounting for potential future _valid_
status codes). Let's assume we only get valid status codes.

- https://github.com/open-telemetry/opentelemetry-go/blob/v1.21.0/semconv/v1.17.0/httpconv/http.go#L85-L89
- https://github.com/open-telemetry/opentelemetry-go/blob/v1.21.0/semconv/internal/v2/http.go#L322-L330
- https://github.com/open-telemetry/opentelemetry-go/blob/v1.21.0/semconv/internal/v2/http.go#L356-L404

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah marked this pull request as ready for review December 12, 2023 11:29
@thaJeztah
Copy link
Member Author

Rebased, squashed the last few commits, and updated the commit message to describe what was one; https://github.com/moby/moby/compare/24febc67fb417702ab9b48f71241cf2014d201d1..4d2a324fce13e12e1adb5e75f3421c26565a7340

diff below in case the link above gets garbage-collected;

diff --git a/client/hijack.go b/client/hijack.go
index 775df6428e..573fe157fb 100644
--- a/client/hijack.go
+++ b/client/hijack.go
@@ -65,7 +65,7 @@ func (cli *Client) setupHijackConn(req *http.Request, proto string) (_ net.Conn,
 	}
 
 	ctx, span := tp.Tracer("").Start(ctx, req.Method+" "+req.URL.Path, trace.WithSpanKind(trace.SpanKindClient))
-	// FIXME(thaJeztah): what's the replacement for this in semconv v1.21?
+	// FIXME(thaJeztah): httpconv.ClientRequest is now an internal package; replace this with alternative for semconv v1.21
 	// span.SetAttributes(httpconv.ClientRequest(req)...)
 	defer func() {
 		if retErr != nil {
@@ -98,8 +98,19 @@ func (cli *Client) setupHijackConn(req *http.Request, proto string) (_ net.Conn,
 	// Server hijacks the connection, error 'connection closed' expected
 	resp, err := clientconn.Do(req)
 	if resp != nil {
-		// Very simplified variant of "httpconv.ClientStatus(resp.StatusCode))"
+		// This is a simplified variant of "httpconv.ClientStatus(resp.StatusCode))";
 		//
+		// The main purpose of httpconv.ClientStatus() is to detect whether the
+		// status was successful (1xx, 2xx, 3xx) or non-successful (4xx/5xx).
+		//
+		// It also provides complex logic to *validate* status-codes against
+		// a hard-coded list meant to exclude "bogus" status codes in "success"
+		// ranges (1xx, 2xx) and convert them into an error status. That code
+		// seemed over-reaching (and not accounting for potential future valid
+		// status codes). We assume we only get valid status codes, and only
+		// look at status-code ranges.
+		//
+		// For reference, see:
 		// https://github.com/open-telemetry/opentelemetry-go/blob/v1.21.0/semconv/v1.17.0/httpconv/http.go#L85-L89
 		// https://github.com/open-telemetry/opentelemetry-go/blob/v1.21.0/semconv/internal/v2/http.go#L322-L330
 		// https://github.com/open-telemetry/opentelemetry-go/blob/v1.21.0/semconv/internal/v2/http.go#L356-L404
diff --git a/cmd/dockerd/daemon.go b/cmd/dockerd/daemon.go
index d80300b3ac..d356566c2f 100644
--- a/cmd/dockerd/daemon.go
+++ b/cmd/dockerd/daemon.go
@@ -240,6 +240,8 @@ func (cli *DaemonCli) start(opts *daemonOptions) (err error) {
 	setOTLPProtoDefault()
 	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
 
+	// Override BuildKit's default Resource so that it matches the semconv
+	// version that is used in our code.
 	detect.Resource = resource.Default()
 	detect.Recorder = detect.NewTraceRecorder()
 
diff --git a/testutil/helpers.go b/testutil/helpers.go
index 711ef65f43..f31ad27944 100644
--- a/testutil/helpers.go
+++ b/testutil/helpers.go
@@ -34,7 +34,7 @@ func (d devZero) Read(p []byte) (n int, err error) {
 
 var tracingOnce sync.Once
 
-// configureTracing sets up an OTLP tracing exporter for use in tests.
+// ConfigureTracing sets up an OTLP tracing exporter for use in tests.
 func ConfigureTracing() func(context.Context) {
 	if os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") == "" {
 		// No OTLP endpoint configured, so don't bother setting up tracing.
@@ -52,9 +52,7 @@ func ConfigureTracing() func(context.Context) {
 		tp = trace.NewTracerProvider(
 			trace.WithSpanProcessor(sp),
 			trace.WithSampler(trace.AlwaysSample()),
-			trace.WithResource(resource.NewSchemaless(
-				attribute.KeyValue{Key: semconv.ServiceNameKey, Value: attribute.StringValue("integration-test-client")},
-			)),
+			trace.WithResource(resource.NewSchemaless(semconv.ServiceName("integration-test-client"))),
 		)
 		otel.SetTracerProvider(tp)
 
diff --git a/vendor.mod b/vendor.mod
index 24b9157f68..726772d61b 100644
--- a/vendor.mod
+++ b/vendor.mod
@@ -141,7 +141,7 @@ require (
 	github.com/containerd/stargz-snapshotter/estargz v0.14.3 // indirect
 	github.com/containerd/ttrpc v1.2.2 // indirect
 	github.com/containernetworking/cni v1.1.2 // indirect
-	github.com/cyphar/filepath-securejoin v0.2.3 // indirect
+	github.com/cyphar/filepath-securejoin v0.2.4 // indirect
 	github.com/dimchansky/utfbom v1.1.1 // indirect
 	github.com/dustin/go-humanize v1.0.0 // indirect
 	github.com/felixge/httpsnoop v1.0.4 // indirect
diff --git a/vendor.sum b/vendor.sum
index 2b40aece20..dc31b7d6c6 100644
--- a/vendor.sum
+++ b/vendor.sum
@@ -374,8 +374,8 @@ github.com/creack/pty v1.1.11/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ
 github.com/creack/pty v1.1.18 h1:n56/Zwd5o6whRC5PMGretI4IdRLlmBXYNjScPaBgsbY=
 github.com/creack/pty v1.1.18/go.mod h1:MOBLtS5ELjhRRrroQr9kyvTxUAFNvYEK993ew/Vr4O4=
 github.com/cyphar/filepath-securejoin v0.2.2/go.mod h1:FpkQEhXnPnOthhzymB7CGsFk2G9VLXONKD9G7QGMM+4=
-github.com/cyphar/filepath-securejoin v0.2.3 h1:YX6ebbZCZP7VkM3scTTokDgBL2TY741X51MTk3ycuNI=
-github.com/cyphar/filepath-securejoin v0.2.3/go.mod h1:aPGpWjXOXUn2NCNjFvBE6aRxGGx79pTxQpKOJNYHHl4=
+github.com/cyphar/filepath-securejoin v0.2.4 h1:Ugdm7cg7i6ZK6x3xDF1oEu1nfkyfH53EtKeQYTC3kyg=
+github.com/cyphar/filepath-securejoin v0.2.4/go.mod h1:aPGpWjXOXUn2NCNjFvBE6aRxGGx79pTxQpKOJNYHHl4=
 github.com/davecgh/go-spew v0.0.0-20151105211317-5215b55f46b2/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
diff --git a/vendor/github.com/cyphar/filepath-securejoin/.travis.yml b/vendor/github.com/cyphar/filepath-securejoin/.travis.yml
deleted file mode 100644
index b94ff8cf92..0000000000
--- a/vendor/github.com/cyphar/filepath-securejoin/.travis.yml
+++ /dev/null
@@ -1,21 +0,0 @@
-# Copyright (C) 2017 SUSE LLC. All rights reserved.
-# Use of this source code is governed by a BSD-style
-# license that can be found in the LICENSE file.
-
-language: go
-go:
-    - 1.13.x
-    - 1.16.x
-    - tip
-arch:
-    - AMD64
-    - ppc64le
-os:
-    - linux
-    - osx
-
-script:
-    - go test -cover -v ./...
-
-notifications:
-    email: false
diff --git a/vendor/github.com/cyphar/filepath-securejoin/README.md b/vendor/github.com/cyphar/filepath-securejoin/README.md
index 3624617c89..4eca0f2355 100644
--- a/vendor/github.com/cyphar/filepath-securejoin/README.md
+++ b/vendor/github.com/cyphar/filepath-securejoin/README.md
@@ -1,6 +1,6 @@
 ## `filepath-securejoin` ##
 
-[![Build Status](https://travis-ci.org/cyphar/filepath-securejoin.svg?branch=master)](https://travis-ci.org/cyphar/filepath-securejoin)
+[![Build Status](https://github.com/cyphar/filepath-securejoin/actions/workflows/ci.yml/badge.svg)](https://github.com/cyphar/filepath-securejoin/actions/workflows/ci.yml)
 
 An implementation of `SecureJoin`, a [candidate for inclusion in the Go
 standard library][go#20126]. The purpose of this function is to be a "secure"
diff --git a/vendor/github.com/cyphar/filepath-securejoin/VERSION b/vendor/github.com/cyphar/filepath-securejoin/VERSION
index 7179039691..abd410582d 100644
--- a/vendor/github.com/cyphar/filepath-securejoin/VERSION
+++ b/vendor/github.com/cyphar/filepath-securejoin/VERSION
@@ -1 +1 @@
-0.2.3
+0.2.4
diff --git a/vendor/github.com/cyphar/filepath-securejoin/join.go b/vendor/github.com/cyphar/filepath-securejoin/join.go
index 7dd08dbbdf..aa32b85fb8 100644
--- a/vendor/github.com/cyphar/filepath-securejoin/join.go
+++ b/vendor/github.com/cyphar/filepath-securejoin/join.go
@@ -39,17 +39,27 @@ func IsNotExist(err error) bool {
 // components in the returned string are not modified (in other words are not
 // replaced with symlinks on the filesystem) after this function has returned.
 // Such a symlink race is necessarily out-of-scope of SecureJoin.
+//
+// Volume names in unsafePath are always discarded, regardless if they are
+// provided via direct input or when evaluating symlinks. Therefore:
+//
+// "C:\Temp" + "D:\path\to\file.txt" results in "C:\Temp\path\to\file.txt"
 func SecureJoinVFS(root, unsafePath string, vfs VFS) (string, error) {
 	// Use the os.* VFS implementation if none was specified.
 	if vfs == nil {
 		vfs = osVFS{}
 	}
 
+	unsafePath = filepath.FromSlash(unsafePath)
 	var path bytes.Buffer
 	n := 0
 	for unsafePath != "" {
 		if n > 255 {
-			return "", &os.PathError{Op: "SecureJoin", Path: root + "/" + unsafePath, Err: syscall.ELOOP}
+			return "", &os.PathError{Op: "SecureJoin", Path: root + string(filepath.Separator) + unsafePath, Err: syscall.ELOOP}
+		}
+
+		if v := filepath.VolumeName(unsafePath); v != "" {
+			unsafePath = unsafePath[len(v):]
 		}
 
 		// Next path component, p.
diff --git a/vendor/modules.txt b/vendor/modules.txt
index 0cc505ac73..c535ddea8a 100644
--- a/vendor/modules.txt
+++ b/vendor/modules.txt
@@ -402,7 +402,7 @@ github.com/cpuguy83/tar2go
 # github.com/creack/pty v1.1.18
 ## explicit; go 1.13
 github.com/creack/pty
-# github.com/cyphar/filepath-securejoin v0.2.3
+# github.com/cyphar/filepath-securejoin v0.2.4
 ## explicit; go 1.13
 github.com/cyphar/filepath-securejoin
 # github.com/deckarep/golang-set/v2 v2.3.0

@thaJeztah
Copy link
Member Author

Windows never fails to deliver on flaky tests 🙊

=== FAIL: github.com/docker/docker/integration/container TestLogs/driver_local/without_tty/only_stderr (2.03s)
    logs_test.go:134: assertion failed: error is not nil: Error response from daemon: failed to create task for container: failed to create shim task: hcs::CreateComputeSystem 57492e5395a29fc461b3b2db2f9c6c6a9e79d7616b05d6330588149aea58e181: The parameter is incorrect.: unknown

trace.WithResource(resource.NewSchemaless(
attribute.KeyValue{Key: semconv.ServiceNameKey, Value: attribute.StringValue("integration-test-client")},
)),
trace.WithResource(resource.NewSchemaless(semconv.ServiceName("integration-test-client"))),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not strictly needed, but while working on this code, I found the semconv.ServiceName utility, and thought it was slightly cleaner than to manually construct the attribute.KeyValue

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@thaJeztah
Copy link
Member Author

Let's bring this one in; I think we've had enough eyes and discussions around this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants