Skip to content
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.

Go exporter should detect GKE resource labels automatically #261

Open
qqqstuv opened this issue May 28, 2020 · 10 comments
Open

Go exporter should detect GKE resource labels automatically #261

qqqstuv opened this issue May 28, 2020 · 10 comments

Comments

@qqqstuv
Copy link

qqqstuv commented May 28, 2020

Right now the Go Stackdriver exporter does not send GKE resource labels to Cloud Trace.

There are a few ways we can get resource labels:

Solution:
Automatically detect GKE resource labels and attach them to the span attributes.

Related issue: census-ecosystem/kubernetes-operator#14

@nilebox
Copy link
Contributor

nilebox commented May 29, 2020

@DukeNg can you please share more details about your use case?
Is that specific to OpenCensus or OpenTelemetry?

Note that this exporter is being used by OpenTelemetry Collector to export to Google Cloud Monitoring/Trace: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/exporter/stackdriverexporter

But OpenTelemetry SDK has an independent exporter implementation: https://github.com/GoogleCloudPlatform/opentelemetry-operations-go

Ideally, auto-detection mechanism should not be specific to vendor exporters, i.e. there should be a generic mechanism in SDK / Collector that would work for any exporter.

/cc @james-bebbington

@nilebox
Copy link
Contributor

nilebox commented May 29, 2020

@qqqstuv
Copy link
Author

qqqstuv commented May 31, 2020

So far I've tested with Opencensus.

I followed the example in the OC doc to instrument both traces and stats and to make OC automatically append Resource labels (particularly GKE labels) to Trace's spans.

I encountered two issues:

1. OC doesn't send resource labels to Cloud Trace:
Spans that are sent to Cloud Trace don't have GKE labels such as k8s.container, k8s.deployment, k8s.namespace k8s.podname. This is currently working in OC Python Stackdriver exporter but it doesn't work in Go.

2. The GKE resource labels detected were not complete
I attempted to figure if it is possible to first retrieve the GKE variables and two ways were recommended:

  1. gke.Detect.
    This returns everything EXCEPT the k8s namespace. I talked to @rghetia and he said that:
  • gke.Detect() depends on NAMESPACE env being set.
  • NAMESPACE env isn't always automatically injected by kubernetes
  1. Using Kubernetes Operator.
    This gives me the NAMESPACE name but the cluster name is empty.

Either way, I think that we should come up with a way to detect all GKE variables.
@rghetia proposed two approaches:

  1. gke.Detect() picks up on metadata and env variables. Therefore Kubernetes should provide all the information through metadata attributes OR inject all the variables.
  2. Have Kube Operator inject all the required env variables for resources. another related issue

What do you think?

@qqqstuv
Copy link
Author

qqqstuv commented Jun 5, 2020

@nilebox friendly ping.

@qqqstuv
Copy link
Author

qqqstuv commented Jun 5, 2020

@rghetia Do you think that it would be an easy fix to have Kubernetes provide all its fields through metadata attributes or inject all the variables in the exporter? or are we blocked on this?

@rghetia
Copy link
Contributor

rghetia commented Jun 5, 2020

@rghetia Do you think that it would be an easy fix to have Kubernetes provide all its fields through metadata attributes or inject all the variables in the exporter? or are we blocked on this?

I don't know if it is easy or not. I am not actively working on this. I'll let @nilebox to take it forward

@nilebox
Copy link
Contributor

nilebox commented Jun 9, 2020

NAMESPACE env isn't always automatically injected by kubernetes

It can be injected using downward API, see MY_POD_NAMESPACE example in https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#use-pod-fields-as-values-for-environment-variables

This gives me the NAMESPACE name but the cluster name is empty

CLUSTER_NAME is specific to GCE, see kubernetes/kubernetes#22043

Is this environment variable missing?

@qqqstuv
Copy link
Author

qqqstuv commented Jun 9, 2020

So what you're saying is that the users have to specify their MY_POD_NAMESPACE in their yaml files before deploying in order to get the NAMESPACE env variable? This is an option worth trying out for sure. On the other hand, if they don't set NAMESPACE we cam assume that value is "default" right?

By cluster name I was referring to k8s_container. I think this is different from GCE CLUSTER_NAME? I've never seen GCE CLUSTER_NAME BEING listed on the resource labels page for OC. Ultimately, cluster name is only missing on Kubernetes Operator but it's working on gke.Detect() so if we can identify NAME_SPACE in gke.Detect() then we can just go with that approach.

@nilebox
Copy link
Contributor

nilebox commented Jun 9, 2020

users have to specify their MY_POD_NAMESPACE in their yaml files before deploying in order to get the NAMESPACE env variable?

They should be setting NAMESPACE env var directly instead of MY_POD_NAMESPACE, this was just an example in the doc, using the downward API.

On the other hand, if they don't set NAMESPACE we cam assume that value is "default" right?

I don't think so. If they don't set NAMESPACE, it can be any namespace in reality. We can probably use a dedicated invalid name, e.g. "<unknown>" as a fallback, but not default.

@james-bebbington
Copy link
Contributor

james-bebbington commented Jun 10, 2020

@DukeNg - just wanted to clarify what your plan is for this issue - did you want us to assign it to you?

This is a bit confusing at the moment as there's two different places in Open Census where resource detection can occur:

  1. Call gke.Detect() manually to a create resource and then use it in SDK, i.e.: https://github.com/census-ecosystem/opencensus-go-resource/blob/master/gke/gke.go
  2. Detection run in the Stackdriver exporter directly: https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/master/monitoredresource/gcp/gcp_metadata_config.go#L60 (I think this runs by default?)

If I understand correctly, in both cases these methods read from environment variables (with the former taking precedence if used). No. 1 also reads the cluster name from instance metadata (which should be the same?). The environment variables can either be manually set or set via the OC operator. I'm not sure why cluster name is coming through as empty for you though; looks like this needs to be manually passed in to the operator - are you doing that correctly?


Note for OpenTelemetry, I currently have an OTEP open to add resource detection to the spec. Once that's finalised it would be good to get this included in the SDK directly rather than as part of the OpenTelemetry Cloud Ops exporter (to avoid confusion like there is with OpenCensus). See open-telemetry/oteps#111 - is this something that will be a blocker for you if not implemented immediately?

Also note there are some minor differences with resources in OpenTelemetry compared to OpenCensus - notably "type" is now just another attribute.

Having said that, the issues you bring up here around GKE detection will also need to be solved in Otel regardless. We will also presumably need to port the OpenCensus operator to OpenTelemetry at some point - is that something you're interested in taking on as part of this work?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants