Skip to content

Execution

a_git_a edited this page Feb 8, 2024 · 10 revisions

Part 3: Data Job Execution


Previous Section: Deployment

An instance of a running Data Job deployment is called an execution.

Data Job execution can run a Data Job one or more times. If a run (attempt) fails due to a platform error, the job can be automatically re-run (this is configurable by Control Service operators).

This applies only to executions in the "Cloud" (Kubernetes). Local executions always comprise a single attempt.

See Data Job Attempt. See opid-executionid-attemptid diagram.

When scheduled in Control Service Cloud, jobs run in a container.
They have a couple of Gigabytes of local storage, which is available only during the execution of the job.
Then, the container is deleted. Those resources can be configured per job during deployment.


After the deployment is complete, the Control Service will automatically go ahead and execute the job according to its schedule. The list of executions can be verified at any point by using the following command:

vdk execute --list -n hello-world -t my-team

This should show details about the last executions of the Data Job:

id                           job_name     status    type       start_time                 end_time                   started_by    message         op_id                        job_version
---------------------------  -----------  --------  ---------  -------------------------  -------------------------  ------------  --------------  ---------------------------  ----------------------------------------
hello-world-latest-27193696  hello-world  finished  scheduled  2021-09-14 12:16:00+00:00  2021-09-14 12:16:51+00:00                Success         hello-world-latest-27193696  d9eedb67fc8d52301dbb61c6d9db4397c3f9a9ec
hello-world-latest-27193698  hello-world  finished  scheduled  2021-09-14 12:18:00+00:00  2021-09-14 12:18:57+00:00                Success         hello-world-latest-27193698  d9eedb67fc8d52301dbb61c6d9db4397c3f9a9ec
hello-world-latest-27193700  hello-world  finished  scheduled  2021-09-14 12:20:00+00:00  2021-09-14 12:20:53+00:00                Success         hello-world-latest-27193700  d9eedb67fc8d52301dbb61c6d9db4397c3f9a9ec
hello-world-latest-27193702  hello-world  finished  scheduled  2021-09-14 12:22:00+00:00  2021-09-14 12:22:58+00:00                Success         hello-world-latest-27193702  d9eedb67fc8d52301dbb61c6d9db4397c3f9a9ec
hello-world-latest-27193704  hello-world  running   scheduled  2021-09-14 12:24:00+00:00                                                           hello-world-latest-27193704  d9eedb67fc8d52301dbb61c6d9db4397c3f9a9ec

A new execution can be started manually at any time by using the following command:

vdk execute --start -n hello-world -t my-team

This command can potentially fail if there is an already running Data Job execution of the hello-world job at this time because parallel executions of the same job are currently not allowed, in order to ensure data integrity.

For the curious: what is going on behind the scenes?

Every execution is carried out by a pod. You can see the execution if you get the list of pods in the cluster:

kubectl get pods

The names of the pods corresponding to our Data Job start with the Data Job name (e.g. hello-world-latest-27193734--1-gb8t2). Find one such pod and show details by running:

kubectl describe hello-world-latest-27193734--1-gb8t2

Check execution logs

Finally, to check the logs of a Data Job Execution use:

vdk execute --logs -n hello-world -t my-team --execution-id [execution-id-printed-from-vdk-execute-start]

Keep in mind that logs are kept only for the last few executions of a Data Job so looking too far into the past is not possible.

➡️ Next Section: Production

Clone this wiki locally