Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detectors/gcp: support Cloud Run jobs #465

Merged
merged 13 commits into from Jun 15, 2023

Conversation

zchee
Copy link
Contributor

@zchee zchee commented Jul 29, 2022

Support Cloud Run jobs using CLOUD_RUN_JOB and CLOUD_RUN_EXECUTION.

Copy link
Member

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gcbrun

@zchee
Copy link
Contributor Author

zchee commented Jul 29, 2022

Note that Cloud Run jobs do not add K_XXX environment variables automatically. Instead of CLOUD_RUN_JOB and CLOUD_RUN_EXECUTION.

@zchee
Copy link
Contributor Author

zchee commented Jul 29, 2022

@damemi not running gcbrun...?

@damemi
Copy link
Member

damemi commented Jul 29, 2022

/gcbrun

@damemi
Copy link
Member

damemi commented Jul 29, 2022

hm that seemed to work

@zchee
Copy link
Contributor Author

zchee commented Jul 29, 2022

thanks

@damemi
Copy link
Member

damemi commented Jul 29, 2022

cc @dashpole

@dashpole
Copy link
Contributor

I don't see those set in https://cloud.google.com/run/docs/container-contract#jobs-env-vars. Are the docs incorrect?

@zchee
Copy link
Contributor Author

zchee commented Jul 29, 2022

@dashpole I think doc is lacking. I was tested on real environment.

detectors/gcp/detector.go Outdated Show resolved Hide resolved
@zchee
Copy link
Contributor Author

zchee commented Jul 29, 2022

@dashpole PTAL

@zchee zchee force-pushed the detectors-gcp/cloud-run-jobs branch from 8db87c0 to 5cf62cc Compare July 29, 2022 20:01
@dashpole
Copy link
Contributor

/gcbrun

@zchee
Copy link
Contributor Author

zchee commented Jul 30, 2022

@dashpole gentle ping

@dashpole
Copy link
Contributor

Yes, sorry. We are discussing as a team to make sure this is the right approach.

@zchee
Copy link
Contributor Author

zchee commented Jul 30, 2022

@dashpole Ah, I see. I waiting your discuss result.

@dashpole
Copy link
Contributor

dashpole commented Aug 1, 2022

Hey @zchee, if you make a job with multiple tasks, what is the value in CLOUD_RUN_EXECUTION? Does each task end up with the same job name and execution? If so, we may need some additional information to prevent collisions between executions.

@zchee
Copy link
Contributor Author

zchee commented Aug 1, 2022

@dashpole I'll investigate it.

Just in case, in the case of the gcloud bate run jobs create command, it means --tasks= flag, not --parallelism flag, right?

ref:

@dashpole
Copy link
Contributor

dashpole commented Aug 1, 2022

If you can, try both, and also look at what CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_ATTEMPT are set to. Really appreciate the help

@zchee
Copy link
Contributor Author

zchee commented Aug 2, 2022

@dashpole okay, will check both situations today.

Also, I'll report Cloud Run jobs docs lacking that's 2 envs via Mercari's Google TAM (I work for Mercari, Inc).

@zchee
Copy link
Contributor Author

zchee commented Aug 3, 2022

@dashpole replied from Google support, said "seems to serverless team forgot update document", So I think would be better to waiting to update docs. Maybe your concern is also cleared.


Also, not yet checked real results (sorry, I was busy a bit) but will check.

@zchee
Copy link
Contributor Author

zchee commented Aug 3, 2022

@dashpole I found otel side hardcoded K_SERVICE, need fix it also.

@dashpole
Copy link
Contributor

dashpole commented Aug 3, 2022

Note that the old cloud run (which that code references) is deprecated in favor of this code base.

@zchee
Copy link
Contributor Author

zchee commented Aug 5, 2022

@dashpole updated document

@zchee
Copy link
Contributor Author

zchee commented Aug 9, 2022

@dashpole gentle ping

I want to hear your opinions.

@zchee zchee requested a review from dashpole August 9, 2022 10:43
@dashpole
Copy link
Contributor

dashpole commented Aug 9, 2022

What were CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_ATTEMPT set to?

Also, what is in d.metadata.InstanceID() (faas.id)?

Basically, we want to make sure that faas.name, faas.version, and faas.id uniquely identify an instance of a cloud run job. Otherwise, if two things are running in parallel with the same faas.* attributes, there could be conflicts.

@zchee
Copy link
Contributor Author

zchee commented Aug 9, 2022

@dashpole I'll reply details later.

@dashpole dashpole self-assigned this Aug 11, 2022
@zchee
Copy link
Contributor Author

zchee commented Sep 10, 2022

@dashpole

Sorry for the late reply.

Use https://github.com/GoogleCloudPlatform/golang-samples/tree/main/run/jobs

diff --git a/run/jobs/main.go b/run/jobs/main.go
index 05e3195..a8a6d0a 100644
--- a/run/jobs/main.go
+++ b/run/jobs/main.go
@@ -21,6 +21,8 @@ import (
 	"os"
 	"strconv"
 	"time"
+
+	"cloud.google.com/go/compute/metadata"
 )
 
 type Config struct {
@@ -99,6 +101,12 @@ func main() {
 	}
 
 	log.Printf("Completed Task #%s, Attempt #%s", config.taskNum, config.attemptNum)
+
+	instanceID, err := metadata.InstanceID()
+	if err != nil {
+		log.Fatal(err)
+	}
+	log.Printf("instanceID: %s\n", instanceID)
 }
 
 // Throw an error based on fail rate
$ gcloud alpha run jobs create --tasks=5 --parallelism=3 --region=asia-northeast1 jobs
# execute...
labels: {
run.googleapis.com/execution_name: "jobs-ph5p9"
run.googleapis.com/task_attempt: "0"
run.googleapis.com/task_index: "0"
}
labels: {
run.googleapis.com/execution_name: "jobs-ph5p9"
run.googleapis.com/task_attempt: "0"
run.googleapis.com/task_index: "1"
}
labels: {
run.googleapis.com/execution_name: "jobs-ph5p9"
run.googleapis.com/task_attempt: "0"
run.googleapis.com/task_index: "2"
}
textPayload: "2022/09/10 08:28:37 instanceID: 00c527f6d4ab9f193b199268ccb9d99d05ebc33d8983a56d2faef9169e9b42adf8fb6caaf62a35d1bff6df6b375c4282310b634da8218f77e706845ac6f4d5f27714"

@damemi
Copy link
Member

damemi commented Apr 11, 2023

Looks like it's failing an e2e with:

Step #1 - "run-tests-cloudfunction": === RUN   TestResourceDetectionTrace/Span_has_label_cloud.platform
Step #1 - "run-tests-cloudfunction":     trace_test.go:288: 
Step #1 - "run-tests-cloudfunction":         	Error Trace:	trace_test.go:288
Step #1 - "run-tests-cloudfunction":         	Error:      	Expect "gcp_cloud_run" to match "gcp_cloud_functions"
Step #1 - "run-tests-cloudfunction":         	Test:       	TestResourceDetectionTrace/Span_has_label_cloud.platform
Step #1 - "run-tests-cloudfunction":         	Messages:   	For label key cloud.platform, value "gcp_cloud_run" did not match regex "gcp_cloud_functions"

I wonder if this is causing cloud run to be picked up as cloud functions in an e2e?

detectors/gcp/detector.go Outdated Show resolved Hide resolved
@damemi
Copy link
Member

damemi commented May 9, 2023

@zchee sorry for the delay in this. The spec changes just merged and from the discussion in open-telemetry/opentelemetry-specification#3378 this will be slightly different than we talked about here.

Instead, we will have separate GCP-specific resource attributes for Cloud Run execution and Job ID (open-telemetry/opentelemetry-go#4079 will add these to the Go semconv package).

I think we should probably just make these available via their own functions, and maybe add 2 new functions IsJob() and IsService(). wdyt?

@zchee
Copy link
Contributor Author

zchee commented May 9, 2023

make sense. I'll implements.

Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
@zchee zchee force-pushed the detectors-gcp/cloud-run-jobs branch from e852ece to 0a7fa91 Compare June 10, 2023 19:19
Cloud Functions gen2 also reserved K_CONFIGURATION environment variable

Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
@zchee zchee force-pushed the detectors-gcp/cloud-run-jobs branch from 0a7fa91 to 68b1cff Compare June 11, 2023 07:27
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
Copy link
Member

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this @zchee. We just need to make a few changes to match the spec for Cloud Run Jobs that was approved upstream.

They did not like the idea of using execution+task as faas.version, so those will be their own individual attributes instead. That means we need separate functions for each of them.

To know that we need to call those functions, Cloud Run Jobs have to be treated like their own environment, separate from Services. So a new detector function for that as well.

imo, faas.name is fine to be shared between Jobs and Services. These are semantically providing the same info, even though it is technically different environment variables.

To summarize, we need the following new functions:

  • onCloudRunJob() - detect if we are running on Cloud Run Jobs
  • FaasJobExecution() - return the Execution ID of the Job
  • FaasJobTaskIndex() - return the Task Index for the execution

Our Collector resource detection processor will be updated to write to the new gcp.cloud_run.* attributes when running on Cloud Run Jobs, and our exporter's monitored resource mapping will be updated to parse execution+task into a unique task ID.

detectors/gcp/faas.go Outdated Show resolved Hide resolved
detectors/gcp/faas.go Outdated Show resolved Hide resolved
detectors/gcp/faas.go Outdated Show resolved Hide resolved
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
@zchee
Copy link
Contributor Author

zchee commented Jun 12, 2023

@damemi All of done. Also, thanks for polite details reply.
PTAL.


BTW, I'm concerned about the inconsistencies in the notation of Cloud Run jobs. Like CloudFunctions vs CloudFunction.
There are

  • Cloud Run jobs (I assume this is the official service name, references from GCP docs
  • Cloud Run Job (lack of end of "s"
  • Cloud Run Jobs (means camel case. For documentation, not for the source code cuz Go's naming convention

For now, I have updated this PR according to otel spec's changes. But certainly not unified between the source code and GCP documentationd. such as CloudRunJob, FaaSJobExecution, FaaSJobTaskIndex, and etc.
WDYT?

@zchee
Copy link
Contributor Author

zchee commented Jun 12, 2023

We should at least leave a comment here about ...

#465 (comment)

Should I include that to this PR?

Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
@damemi
Copy link
Member

damemi commented Jun 12, 2023

We should at least leave a comment here about ...

#465 (comment)

Should I include that to this PR?

yes please do, this will prevent the same headache in the future :)

for the naming inconsistencies, I don't really have a good answer. I think the usage depends on the context, ie when referring to the product "Cloud Run jobs" makes sense, when referring to multiple jobs the plural makes sense vs a singular job. Either way, I would always prefer the cloudRunJob camel case vs cloudRunjob in code

func (d *Detector) FaaSVersion() (string, error) {
if version, found := d.os.LookupEnv(faasRevisionEnv); found {
return version, nil
}
return "", errEnvVarNotFound
}

// FaaSID returns the instance id of the cloud run instance or cloud function.
// FaaSJobExecution returns the execution id of the Cloud Run jobs.
func (d *Detector) FaaSJobExecution() (string, error) {
Copy link
Member

@damemi damemi Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, the new functions should probably match the resource attribute they are updating. So CloudRunJobExecution() and CloudRunJobTaskIndex(). Then it's:

  • FaaSName() -> faas.name
  • FaaSVersion() -> faas.version
  • CloudRunJobExecution() -> gcp.cloud_run.job.execution
  • CloudRunJobTaskIndex() -> gcp.cloud_run.job.task_index

sorry to ask you to change this again. does that sound good to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something shorter like JobExecution() and JobTaskIndex() are probably fine too

Copy link
Contributor Author

@zchee zchee Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the first one cuz mapped to otel spec is readable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me!

Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
detectors/gcp/faas.go Outdated Show resolved Hide resolved
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
@zchee
Copy link
Contributor Author

zchee commented Jun 15, 2023

@damemi PTAL.

@damemi
Copy link
Member

damemi commented Jun 15, 2023

/gcbrun

Copy link
Member

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @zchee for working on this!

@damemi damemi merged commit bd431a2 into GoogleCloudPlatform:main Jun 15, 2023
25 checks passed
@zchee zchee deleted the detectors-gcp/cloud-run-jobs branch June 15, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants