-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry support #356
Comments
Is this meant to be used together with JSON or text output or instead of it? |
I'd assume that it works in parallel to them. For example if tracing is enabled, then we pass the text message to the tracer side by side to the other output(s). |
Makes sense, in particular when the tracer doesn't record all log output. An implementation of this will depend on contextual logging because non-contextual calls ( But contextual logging alone is not enough because those calls are also not passed a context. Whoever adds a span to the context would also have to install a logger for it which then would get picked up by contextual logging calls in that call chain. As https://github.com/sallyom/enhancements-1/blob/master/keps/sig-instrumentation/2831-kubelet-tracing/README.md describes how the code gets instrumented, it probably should get extended to also cover the logging aspect. |
Depends how we make it usable. If we have a global tracer set, then we may create spans from that as well. |
An alternative, which I would prefer, would be to attach the trace ID and span ID to logs as part of the structured output (as trace_id and span_id), which can then be used to link to/from traces.
You still need a context if you want to link to other spans. The global tracerprovider just lets you export spans from anywhere. Thus far, we (sig-instrumentation) haven't touched on the subject of log or metric correlation, but we probably should extend our contributor guidelines add traceing guidelines + log and metric correlation guidelines for other components. @dgrisonnet @logicalhan |
So no logging through OpenTelemetry? That already works. The code that adds trace and span ID then would have to look like this:
|
OpenTelemetry is working on a set of logging libraries, but they are still a ways off (https://opentelemetry.io/docs/instrumentation/go/#status-and-releases). We can discuss what the benefits would be of adopting it, but I'm not sure what the benefits would be. I'm just saying we shouldn't add all logs as span events. Logging is its own thing, and has well-defined ways to correlate with tracing that it is probably best we follow. What I really want is for |
There are two ways of doing this:
The problems with extending
The future contextual logging API in Golang will add some more options, but that is just being implemented and will be experimental. I am not sure whether want to wait for it. It will support adding the current context to the logger when retrieving it. |
We also have to consider whether it is acceptable to make klog depend on OpenTelemetry. If any code in klog is meant to retrieve these IDs, it will have to have such a dependency, right? |
It should be possible to just configure klog to extract certain values by giving it the keys. Then it won't depend on OpenTelemetry. It may even be possible to move this all into go-logr and do it so that it works for all logger backends - let me try that out... |
As a note, otel's interfaces, including context, are kept separate from the rest of OTel's codebase. As a result, it has nearly no dependencies. But it is still a consideration. |
I agree adding span id, trace id to logs will be useful. func logWithContext(span oteltrace.Span) log.Fields {
return log.Fields{
"span_id": span.SpanContext().SpanID().String(),
"trace_id": span.SpanContext().TraceID().String(),
}
}
log.WithFields(logWithContext(span)).Info("Counter", zap.String("response", msg)) |
I've implemented that in PR #358. It is fairly complex because wrapping a LogSink (as done in that PR) is tricky. I'll try next what it would look like if logr.Logger itself did the extra work. Then wrapping a LogSink will not be necessary and it will also work for code which just does |
I think the best solution is with this changed in logr: go-logr/logr#158 Once that is merged, #359 can also be merged. |
As a note, those won't work well with OpenTelemetry. The context keys for span context and for baggage are not exported, so we would need to support accessor functions or something similar to be able to make use of context-based functionality |
That's surprising. I could have sworn that OpenTelemetry was mentioned as use case for logging from context in the upstream Go discussion around a future builtin contextual logging API. But perhaps it was just tracing in general and people will hit this snag later when they really try to use it (like we do now). So would it be possible to get access to these context keys? If not, then we are stuck with the solution from #356 (comment), which is more expensive. |
One possible alternative would be to allow passing a But what is it that should get logged for such a span? #356 (comment) mentions Should all of those pairs get logged? Using which keys at the level of a log entry? We must be relatively certain that there is no accidental overlap with keys passed to |
I think it was a general discussion. It might've been a reference to how OpenCensus works, which is that all attributes ("tags" in OC) on spans/metrics are from context. Example. OpenTelemetry's SpanContext provides accessors for TraceID and SpanID (OpenTracing's SpanContext, which is what you linked, does not appear to provide explicit accessors). I DO NOT think we should add all SpanContext-based k/v pairs from either TraceState (w3c tracestate)or from Baggage (w3c baggage). It would be nice if there was a way to opt-in values from baggage, though. |
So if we log span ID and trace ID, what should that look like?
Or
|
It should look like:
|
I want to be sensitive to performance , too - having to call an arbitrary number of functions once is not a big deal, but doing that on every log call seems unfortunate, unless we're clear about how we can cache those. |
Logging as Calling @thockin I saw that you pre-render key/value pairs in structr's When instrumenting the scheduler with contextual logging, a big concern was the slowdown at low log levels because those are typically used when performance is crucial. But I haven't checked how often there there would have been a Perhaps some on-demand caching could give us the best of both worlds (only do expensive work if needed, but then only once). It's just not easy to implement because such a cache would need to be shared between different |
I have implemented this in the same PRs which also implement #357:
With that modified Kubernetes and a local-up cluster, I get: $ grep "Immediate dispatch" /tmp/kube-apiserver.log
I1124 08:44:15.439987 46581 queueset.go:716] "Immediate dispatch" trace_id="2f53c6cc100aaa44d8d9303fcc75f3a1" span_id="3d99c8f69661e4f1" QS="catch-all" t="2022-11-24 08:44:15.439803964" R="0.00000000ss" request="catch-all" description1=&{IsResourceRequest:true Path:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/127.0.0.1 Verb:get APIPrefix:apis APIGroup:coordination.k8s.io APIVersion:v1 Namespace:kube-node-lease Resource:leases Subresource: Name:127.0.0.1 Parts:[leases 127.0.0.1]} description2=&{Name:system:node:127.0.0.1 UID: Groups:[system:nodes system:authenticated] Extra:map[]} qsExecuting=1 $ cat /nvme/gopath/src/k8s.io/apiserver-tracing.conf
apiVersion: apiserver.config.k8s.io/v1alpha1
kind: TracingConfiguration
# default value
#endpoint: localhost:4317
samplingRatePerMillion: 1000000
$ APISERVER_TRACING_CONFIG_FILE=/nvme/gopath/src/k8s.io/apiserver-tracing.conf FEATURE_GATES=APIServerTracing=true,ContextualLogging=true KUBELET_RESOLV_CONF="/etc/resolv-9999.conf" DNS_ADDON="coredns" CGROUP_DRIVER=systemd CONTAINER_RUNTIME_ENDPOINT=unix:///var/run/crio/crio.sock LOG_LEVEL=6 ENABLE_CSI_SNAPSHOTTER=false API_SECURE_PORT=6444 ALLOW_PRIVILEGED=1 PATH=/nvme/gopath/src/k8s.io/kubernetes/third_party/etcd:$PATH ./hack/local-up-cluster.sh -O |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind feature
Describe the solution you'd like
Logs are helpful data attached to OpenTelemetry traces. klog could support that by adding the log messages to the current tracing span if it exist.
Anything else you would like to add:
There is a k/k KEP existing for the kubelet tracing part: kubernetes/enhancements#2831
Refers to kubernetes/kubernetes#113414
The text was updated successfully, but these errors were encountered: