Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed Workflow memoized resubmit does not respect DAG dependencies #12936

Open
3 of 4 tasks
duc00 opened this issue Apr 15, 2024 · 1 comment · May be fixed by #12940
Open
3 of 4 tasks

Failed Workflow memoized resubmit does not respect DAG dependencies #12936

duc00 opened this issue Apr 15, 2024 · 1 comment · May be fixed by #12940

Comments

@duc00
Copy link

duc00 commented Apr 15, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

I have a data pipeline DAG composed of a series of dependent transformations. When one particular tasks fails, I would expect a memoized resubmit to restart at the failed task and to respect dependencies.

What actually happens is that all subsequent tasks are executed at the same time, without respecting their dependencies. This happens whether failFast is set to true or false. This happens using both the UI and the CLI.

Here is the original workflow view:
Screenshot 2024-04-15 at 14 44 58

Now when I run argo resubmit --memoized resubmit-bug-dag, the resubmit workflow ends up in the following state:
Screenshot 2024-04-15 at 14 45 55

Thus I cannot use the resubmit memoized feature for the DAG as it breaks my pipelines. For the moment, I think I will try to implement step-level memoization as a workaround, though it is not optimal as I now have to set an arbitrary maxAge for future pipelines to properly work.

Thank you for your help!

Version

v3.5.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: resubmit-bug-dag-
spec:
  serviceAccountName: argo-executor
  entrypoint: rand-fail-dag
  templates:
    - name: rand-fail-dag
      dag:
        tasks:
          - name: A
            template: success
          - name: B
            template: fail
            depends: A
          - name: C
            depends: "B"
            template: success
          - name: D
            depends: "C"
            template: success
    - name: fail
      container:
        image: busybox
        command: ["sh", -c]
        args:
          - exit 1
    - name: success
      container:
        image: busybox
        command: ["sh", -c]
        args:
          - exit 0

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.882Z" level=info msg="Processing workflow" Phase= ResourceVersion=19967870 namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.893Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.893Z" level=info msg="Updated phase  -> Running" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.894Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.894Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.895Z" level=info msg="DAG node resubmit-bug-dag-nqnhz initialized Running" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.895Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:32.895Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:32.895Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:32.895Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-602511029, taskName A"
time="2024-04-15T12:51:32.896Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-602511029, taskName A"
time="2024-04-15T12:51:32.896Z" level=info msg="All of node resubmit-bug-dag-nqnhz.A dependencies [] completed" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.896Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.896Z" level=info msg="Pod node resubmit-bug-dag-nqnhz-602511029 initialized Pending" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.913Z" level=info msg="Created pod: resubmit-bug-dag-nqnhz.A (resubmit-bug-dag-nqnhz-success-602511029)" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.914Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:32.914Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:32.914Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:32.914Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:32.914Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:32.914Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:32.914Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.914Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:32.950Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=19967874 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:42.920Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=19967874 namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:42.920Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:42.921Z" level=info msg="node changed" namespace=default new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=resubmit-bug-dag-nqnhz-602511029 old.message= old.phase=Pending old.progress=0/1 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:42.921Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:42.922Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:42.922Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:42.922Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:42.923Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:42.923Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:42.923Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:42.923Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:42.923Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:42.923Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:42.923Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:42.938Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=19967897 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:58.955Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=19967897 namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:58.955Z" level=info msg="Task-result reconciliation" namespace=default numObjs=1 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:58.955Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-nqnhz-602511029 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:58.956Z" level=info msg="node unchanged" namespace=default nodeID=resubmit-bug-dag-nqnhz-602511029 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:58.956Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:58.956Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:58.956Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:58.957Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:58.957Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:58.957Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:51:58.957Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:58.957Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:51:58.957Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:51:58.957Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:58.957Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:51:58.987Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=19967985 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.677Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=19967985 namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.678Z" level=info msg="Task-result reconciliation" namespace=default numObjs=1 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.678Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-nqnhz-602511029 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.679Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=resubmit-bug-dag-nqnhz-602511029 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.680Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:09.681Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:09.681Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:52:09.681Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:52:09.683Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-552178172, taskName B"
time="2024-04-15T12:52:09.683Z" level=info msg="All of node resubmit-bug-dag-nqnhz.B dependencies [A] completed" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.683Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.684Z" level=info msg="Pod node resubmit-bug-dag-nqnhz-552178172 initialized Pending" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.729Z" level=info msg="Created pod: resubmit-bug-dag-nqnhz.B (resubmit-bug-dag-nqnhz-fail-552178172)" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.729Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:09.729Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:09.729Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:09.729Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:09.729Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.729Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.774Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=19968048 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:09.784Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/resubmit-bug-dag-nqnhz-success-602511029/labelPodCompleted
time="2024-04-15T12:52:19.732Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=19968048 namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:19.732Z" level=info msg="Task-result reconciliation" namespace=default numObjs=2 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:19.733Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-nqnhz-602511029 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:19.733Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-nqnhz-552178172 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:19.733Z" level=info msg="node changed" namespace=default new.message= new.phase=Running new.progress=0/1 nodeID=resubmit-bug-dag-nqnhz-552178172 old.message= old.phase=Pending old.progress=0/1 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:19.733Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:19.733Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:19.734Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:19.734Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:19.734Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:19.734Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:19.734Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:19.734Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:19.750Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=19968092 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.901Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=19968092 namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.902Z" level=info msg="Task-result reconciliation" namespace=default numObjs=2 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.902Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-nqnhz-602511029 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.902Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-nqnhz-552178172 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.902Z" level=info msg="Pod failed: Error (exit code 1)" displayName=B namespace=default pod=resubmit-bug-dag-nqnhz-fail-552178172 templateName=fail workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.902Z" level=info msg="node changed" namespace=default new.message="Error (exit code 1)" new.phase=Failed new.progress=0/1 nodeID=resubmit-bug-dag-nqnhz-552178172 old.message= old.phase=Running old.progress=0/1 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.903Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:29.903Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:29.903Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-568955791, taskName C"
time="2024-04-15T12:52:29.903Z" level=info msg="Skipped node resubmit-bug-dag-nqnhz-568955791 initialized Omitted (message: omitted: depends condition not met)" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.903Z" level=warning msg="was unable to obtain the node for resubmit-bug-dag-nqnhz-518622934, taskName D"
time="2024-04-15T12:52:29.904Z" level=info msg="Skipped node resubmit-bug-dag-nqnhz-518622934 initialized Omitted (message: omitted: depends condition not met)" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.904Z" level=info msg="Outbound nodes of resubmit-bug-dag-nqnhz set to [resubmit-bug-dag-nqnhz-518622934]" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.904Z" level=info msg="node resubmit-bug-dag-nqnhz phase Running -> Failed" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.904Z" level=info msg="node resubmit-bug-dag-nqnhz finished: 2024-04-15 12:52:29.904236889 +0000 UTC" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.904Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.904Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.904Z" level=info msg="Updated phase Running -> Failed" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.904Z" level=info msg="Marking workflow completed" namespace=default workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.910Z" level=info msg="cleaning up pod" action=deletePod key=default/resubmit-bug-dag-nqnhz-1340600742-agent/deletePod
time="2024-04-15T12:52:29.927Z" level=info msg="Workflow update successful" namespace=default phase=Failed resourceVersion=19968129 workflow=resubmit-bug-dag-nqnhz
time="2024-04-15T12:52:29.970Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/resubmit-bug-dag-nqnhz-fail-552178172/labelPodCompleted

kubectl logs -n argo deploy/workflow-controller | grep resubmit-bug-dag-h3pye (resubmitted workflow)
time="2024-04-15T12:54:03.999Z" level=info msg="Processing workflow" Phase= ResourceVersion=19968390 namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.013Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.013Z" level=info msg="Updated phase  -> Running" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.054Z" level=info msg="Created pod: resubmit-bug-dag-h3pye.B (resubmit-bug-dag-h3pye-fail-1494988566)" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.082Z" level=info msg="Created pod: resubmit-bug-dag-h3pye.C (resubmit-bug-dag-h3pye-success-1511766185)" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.111Z" level=info msg="Created pod: resubmit-bug-dag-h3pye.D (resubmit-bug-dag-h3pye-success-1528543804)" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.112Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.112Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:04.138Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=19968398 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.065Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=19968398 namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.066Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.067Z" level=info msg="node changed" namespace=default new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=resubmit-bug-dag-h3pye-1511766185 old.message= old.phase=Pending old.progress= workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.068Z" level=info msg="node changed" namespace=default new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=resubmit-bug-dag-h3pye-1494988566 old.message= old.phase=Pending old.progress=0/1 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.069Z" level=info msg="node changed" namespace=default new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=resubmit-bug-dag-h3pye-1528543804 old.message= old.phase=Pending old.progress= workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.073Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.073Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:14.107Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=19968441 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.069Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=19968441 namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.071Z" level=info msg="Task-result reconciliation" namespace=default numObjs=3 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.071Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-h3pye-1528543804 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.071Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-h3pye-1494988566 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.071Z" level=info msg="task-result changed" namespace=default nodeID=resubmit-bug-dag-h3pye-1511766185 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.071Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=resubmit-bug-dag-h3pye-1511766185 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.072Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=resubmit-bug-dag-h3pye-1528543804 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.073Z" level=info msg="Pod failed: Error (exit code 1)" displayName=B namespace=default pod=resubmit-bug-dag-h3pye-fail-1494988566 templateName=fail workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.073Z" level=info msg="node changed" namespace=default new.message="Error (exit code 1)" new.phase=Failed new.progress=0/1 nodeID=resubmit-bug-dag-h3pye-1494988566 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.075Z" level=info msg="Outbound nodes of resubmit-bug-dag-h3pye set to [resubmit-bug-dag-h3pye-1528543804]" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.075Z" level=info msg="node resubmit-bug-dag-h3pye phase Pending -> Succeeded" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.076Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.076Z" level=info msg=reconcileAgentPod namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.076Z" level=info msg="Updated phase Running -> Succeeded" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.076Z" level=info msg="Marking workflow completed" namespace=default workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.083Z" level=info msg="cleaning up pod" action=deletePod key=default/resubmit-bug-dag-h3pye-1340600742-agent/deletePod
time="2024-04-15T12:54:32.118Z" level=info msg="Workflow update successful" namespace=default phase=Succeeded resourceVersion=19968530 workflow=resubmit-bug-dag-h3pye
time="2024-04-15T12:54:32.227Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/resubmit-bug-dag-h3pye-fail-1494988566/labelPodCompleted
time="2024-04-15T12:54:32.227Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/resubmit-bug-dag-h3pye-success-1511766185/labelPodCompleted
time="2024-04-15T12:54:32.227Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/resubmit-bug-dag-h3pye-success-1528543804/labelPodCompleted

Logs from in your workflow's wait container

kubectl logs -n default -c wait -l workflows.argoproj.io/workflow=resubmit-bug-dag-nqnhz,workflow.argoproj.io/phase!=Succeeded
time="2024-04-15T12:51:58.001Z" level=info msg="Starting Workflow Executor" version=v3.5.5
time="2024-04-15T12:51:58.012Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-04-15T12:51:58.012Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=resubmit-bug-dag-nqnhz-success-602511029 templateName=success version="&Version{Version:v3.5.5,BuildDate:2024-02-29T21:00:43Z,GitCommit:c80b2e91ebd7e7f604e88442f45ec630380effa0,GitTag:v3.5.5,GitTreeState:clean,GoVersion:go1.21.7,Compiler:gc,Platform:linux/arm64,}"
time="2024-04-15T12:51:58.037Z" level=info msg="Starting deadline monitor"
time="2024-04-15T12:52:01.039Z" level=info msg="Main container completed" error="<nil>"
time="2024-04-15T12:52:01.039Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-04-15T12:52:01.040Z" level=info msg="No output parameters"
time="2024-04-15T12:52:01.040Z" level=info msg="No output artifacts"
time="2024-04-15T12:52:01.127Z" level=info msg="Alloc=7090 TotalAlloc=13214 Sys=23653 NumGC=4 Goroutines=8"
time="2024-04-15T12:52:17.252Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-04-15T12:52:17.253Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=resubmit-bug-dag-nqnhz-fail-552178172 templateName=fail version="&Version{Version:v3.5.5,BuildDate:2024-02-29T21:00:43Z,GitCommit:c80b2e91ebd7e7f604e88442f45ec630380effa0,GitTag:v3.5.5,GitTreeState:clean,GoVersion:go1.21.7,Compiler:gc,Platform:linux/arm64,}"
time="2024-04-15T12:52:17.278Z" level=info msg="Starting deadline monitor"
time="2024-04-15T12:52:20.279Z" level=info msg="Main container completed" error="<nil>"
time="2024-04-15T12:52:20.279Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-04-15T12:52:20.279Z" level=info msg="No output parameters"
time="2024-04-15T12:52:20.279Z" level=info msg="No output artifacts"
time="2024-04-15T12:52:20.305Z" level=info msg="Alloc=7222 TotalAlloc=13183 Sys=19301 NumGC=4 Goroutines=8"
time="2024-04-15T12:52:20.317Z" level=info msg="Deadline monitor stopped"
time="2024-04-15T12:52:20.317Z" level=info msg="stopping progress monitor (context done)" error="context canceled"

kubectl logs -n default -c wait -l workflows.argoproj.io/workflow=resubmit-bug-dag-h3pye,workflow.argoproj.io/phase!=Succeeded (resubmitted workflow)
time="2024-04-15T12:54:22.784Z" level=info msg="Starting Workflow Executor" version=v3.5.5
time="2024-04-15T12:54:22.793Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-04-15T12:54:22.793Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=resubmit-bug-dag-h3pye-success-1511766185 templateName=success version="&Version{Version:v3.5.5,BuildDate:2024-02-29T21:00:43Z,GitCommit:c80b2e91ebd7e7f604e88442f45ec630380effa0,GitTag:v3.5.5,GitTreeState:clean,GoVersion:go1.21.7,Compiler:gc,Platform:linux/arm64,}"
time="2024-04-15T12:54:22.814Z" level=info msg="Starting deadline monitor"
time="2024-04-15T12:54:25.816Z" level=info msg="Main container completed" error="<nil>"
time="2024-04-15T12:54:25.816Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-04-15T12:54:25.816Z" level=info msg="No output parameters"
time="2024-04-15T12:54:25.816Z" level=info msg="No output artifacts"
time="2024-04-15T12:54:25.880Z" level=info msg="Alloc=7768 TotalAlloc=13146 Sys=19557 NumGC=4 Goroutines=8"
time="2024-04-15T12:54:22.741Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-04-15T12:54:22.742Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=resubmit-bug-dag-h3pye-fail-1494988566 templateName=fail version="&Version{Version:v3.5.5,BuildDate:2024-02-29T21:00:43Z,GitCommit:c80b2e91ebd7e7f604e88442f45ec630380effa0,GitTag:v3.5.5,GitTreeState:clean,GoVersion:go1.21.7,Compiler:gc,Platform:linux/arm64,}"
time="2024-04-15T12:54:22.777Z" level=info msg="Starting deadline monitor"
time="2024-04-15T12:54:25.778Z" level=info msg="Main container completed" error="<nil>"
time="2024-04-15T12:54:25.779Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-04-15T12:54:25.779Z" level=info msg="No output parameters"
time="2024-04-15T12:54:25.780Z" level=info msg="No output artifacts"
time="2024-04-15T12:54:25.810Z" level=info msg="Alloc=7146 TotalAlloc=13189 Sys=23397 NumGC=4 Goroutines=8"
time="2024-04-15T12:54:25.828Z" level=info msg="Deadline monitor stopped"
time="2024-04-15T12:54:25.828Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2024-04-15T12:54:22.653Z" level=info msg="Starting Workflow Executor" version=v3.5.5
time="2024-04-15T12:54:22.664Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-04-15T12:54:22.664Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=resubmit-bug-dag-h3pye-success-1528543804 templateName=success version="&Version{Version:v3.5.5,BuildDate:2024-02-29T21:00:43Z,GitCommit:c80b2e91ebd7e7f604e88442f45ec630380effa0,GitTag:v3.5.5,GitTreeState:clean,GoVersion:go1.21.7,Compiler:gc,Platform:linux/arm64,}"
time="2024-04-15T12:54:22.697Z" level=info msg="Starting deadline monitor"
time="2024-04-15T12:54:25.697Z" level=info msg="Main container completed" error="<nil>"
time="2024-04-15T12:54:25.698Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-04-15T12:54:25.698Z" level=info msg="No output parameters"
time="2024-04-15T12:54:25.698Z" level=info msg="No output artifacts"
time="2024-04-15T12:54:25.724Z" level=info msg="Alloc=9341 TotalAlloc=13178 Sys=23397 NumGC=3 Goroutines=8"
@shuangkun
Copy link
Member

Reproduced it and will have a look.

@shuangkun shuangkun self-assigned this Apr 15, 2024
@agilgur5 agilgur5 changed the title Failed Workflow resubmit does not respect DAG dependencies Failed Workflow memoized resubmit does not respect DAG dependencies Apr 15, 2024
shuangkun added a commit to shuangkun/argo-workflows that referenced this issue Apr 15, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
shuangkun added a commit to shuangkun/argo-workflows that referenced this issue Apr 15, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants