Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get the error "key was not found for ..." after upgrade from v3.4.7 to v3.4.11 #11843

Closed
2 of 3 tasks
Anton-Sagurov opened this issue Sep 19, 2023 · 6 comments · Fixed by #11847
Closed
2 of 3 tasks

Get the error "key was not found for ..." after upgrade from v3.4.7 to v3.4.11 #11843

Anton-Sagurov opened this issue Sep 19, 2023 · 6 comments · Fixed by #11847
Assignees
Labels
area/controller Controller issues, panics type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@Anton-Sagurov
Copy link

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

We just upgraded the argo-workflows from v3.4.7 to v3.4.11. The same also happens with the latest argo-server and workflow-controller images.

Our workflows that we are using stop working with the error message like: "key was not found for sc-scylla-upgrade-hlmsx-3958779216"

The UI:
image

The workflow-controller configuration wasn't change.

The workflow configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
  namespace: argoworkflow
data:
  artifactRepository: |
    archiveLogs: true
    s3:
      endpoint: s3.amazonaws.com
      bucket: argo-workflows-734708892259-eu-central-1
      region: eu-central-1
      insecure: false
      keyFormat: "artifacts/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{workflow.name}}/{{pod.name}}"

      accessKeySecret:
        name: argoworkflow-iam-user-secret-creds
        key: accessKey
      secretKeySecret:
        name: argoworkflow-iam-user-secret-creds
        key: secretKey
      useSDKCreds: false
      encryptionOptions:
        enableEncryption: false
  executor: |
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: "0.4"
        memory: 300M
      limits:
        memory: 2G
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
    args:
    - --loglevel
    - debug
    - --gloglevel
    - "6"
    env:
    - name: ARGO_TRACE
      value: "1"
  images: |
    argoproj/argosay:v2:
      cmd: [/argosay]
    docker/whalesay:latest:
      cmd: [/bin/bash]
  instanceID: argoworkflow
  mainContainer: |
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: "0.4"
        memory: 300M
      limits:
        memory: 2G
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 1000
  metricsConfig: |
    enabled: true
    path: /metrics
    port: 9090
    metricsTTL: "10m"
    ignoreErrors: false
    secure: false
    disableLegacy: true
  namespaceParallelism: "15000"
  parallelism: "100"
  persistence: |
    connectionPool:
      maxIdleConns: 100
      maxOpenConns: 0
      connMaxLifetime: "0s"
    nodeStatusOffLoad: true
    archive: true
    archiveTTL: "180d"
    archiveLabelSelector:
      matchLabels:
        workflows.argoproj.io/archive-strategy: "always"
    clusterName: devops-eu-central-1-lab
    postgresql:
      host: argoworkflowdb-lab.cmnd0tcvsfax.eu-central-1.rds.amazonaws.com
      port: 5432
      database: argoworkflow
      tableName: argo_workflows
      userNameSecret:
        name: argo-postgres-config
        key: username
      passwordSecret:
        name: argo-postgres-config
        key: password
      ssl: true
      sslMode: require
  resourceRateLimit: |
    limit: 10
    burst: 1
  sso: |
    issuer: https://domain.okta.com
    sessionExpiry: 8h
    clientId:
      name: sso-secret
      key: client_id
    clientSecret:
      name: sso-secret
      key: client_secret
    scopes:
     - groups
     - email
     - profile
    rbac:
      enabled: true
    insecureSkipVerify: false
  workflowDefaults: |
    metadata:
      annotations:
        argo: workflows
        k8s: devops-eu-central-1-lab
    spec:
      ttlStrategy:
        # 604800 = 7Days
        # 18144000 = 30Days
        secondsAfterCompletion: 604800
        secondsAfterFailure: 604800
        secondsAfterSuccess: 604800
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
  workflowRestrictions: |
    templateReferencing: Strict

Version

v3.4.11

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: simple-debug 
  labels: 
    workflows.argoproj.io/controller-instanceid: argoworkflow
spec:
  entrypoint: main
  archiveLogs: true
  artifactGC:
    strategy: Never
  podGC:
    strategy: OnWorkflowCompletion
  #nodeSelector: 
  #  node_group_type: private
  workflowMetadata:
    labels: 
      workflows.argoproj.io/archive-strategy: always
      workflows.argoproj.io/controller-instanceid: argoworkflow
  onExit: exit-handler

  templates:
  - name: main
    parallelism: 1
    dag:
      tasks:
      - name: date 
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: date

      - name: a
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo a"
        dependencies:
        - date

      - name: b
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo b"
        dependencies:
        - date

      - name: c
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo c"
        dependencies:
        - date

      - name: A
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo {{item}}"
        withParam: '["A1", "A2", "A3", "A4", "A5", "A6"]'
        dependencies:
        - a 

      - name: C
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo C"
        dependencies:
        - c 

      - name: showID
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "id"
        dependencies:
        - C 

      - name: printMessages 
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo {{item}}"
        withParam: '["1", "2", "3", "4", "5", "6"]'
        dependencies:
        - showID

  - name: exec-command
    inputs:
      parameters:
      - name: command
    container:
      image: "bash:5.2.15"
      command: ["bash", "-c"]
      args: ["{{inputs.parameters.command}}"]

  - name: exit-handler
    steps:
    - - name: succeeded
        when: '{{workflow.status}} == Succeeded'
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo Success!"
    - - name: failed
        when: '{{workflow.status}} != Succeeded'
        template: exec-command
        arguments:
          parameters:
          - name: command
            value: "echo Failed!"

Logs from the workflow controller

kubectl logs -n argoworkflow deploy/workflow-controller | grep simple-debug-jmr6w
time="2023-09-19T13:11:53.764Z" level=info msg="Processing workflow" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.879Z" level=info msg="Updated phase  -> Running" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.880Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.880Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.880Z" level=info msg="DAG node simple-debug-jmr6w initialized Running" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.880Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:11:53.880Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-752728711, taskName a"
time="2023-09-19T13:11:53.880Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-2463062648, taskName date"
time="2023-09-19T13:11:53.880Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-2463062648, taskName date"
time="2023-09-19T13:11:53.880Z" level=info msg="All of node simple-debug-jmr6w.date dependencies [] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.880Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.880Z" level=info msg="Pod node simple-debug-jmr6w-2463062648 initialized Pending" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.981Z" level=info msg="Created pod: simple-debug-jmr6w.date (simple-debug-jmr6w-exec-command-2463062648)" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-752728711, taskName a"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-752728711, taskName a"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:11:53.981Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:11:53.981Z" level=info msg="TaskSet Reconciliation" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:53.981Z" level=info msg=reconcileAgentPod namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:11:54.033Z" level=info msg="Workflow update successful" namespace=wkf-support phase=Running resourceVersion=87741791 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.765Z" level=info msg="Processing workflow" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.765Z" level=info msg="Task-result reconciliation" namespace=wkf-support numObjs=0 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.765Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.765Z" level=info msg="node changed" namespace=wkf-support new.message= new.phase=Succeeded new.progress=0/1 nodeID=simple-debug-jmr6w-2463062648 old.message= old.phase=Pending old.progress=0/1 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.766Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:12:03.766Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-752728711, taskName a"
time="2023-09-19T13:12:03.766Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-752728711, taskName a"
time="2023-09-19T13:12:03.766Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-752728711, taskName a"
time="2023-09-19T13:12:03.766Z" level=info msg="All of node simple-debug-jmr6w.a dependencies [date] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.766Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.768Z" level=info msg="Pod node simple-debug-jmr6w-752728711 initialized Pending" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.796Z" level=info msg="Created pod: simple-debug-jmr6w.a (simple-debug-jmr6w-exec-command-752728711)" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:03.796Z" level=info msg="All of node simple-debug-jmr6w.b dependencies [date] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.796Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.796Z" level=info msg="template (node simple-debug-jmr6w) active children parallelism exceeded 1" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:03.796Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:03.796Z" level=info msg="All of node simple-debug-jmr6w.c dependencies [date] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.796Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.797Z" level=info msg="template (node simple-debug-jmr6w) active children parallelism exceeded 1" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.797Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:12:03.797Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:03.797Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:12:03.797Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:12:03.797Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:12:03.797Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:12:03.797Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:12:03.797Z" level=info msg="TaskSet Reconciliation" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.797Z" level=info msg=reconcileAgentPod namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:03.797Z" level=warning msg="Coudn't obtain child for simple-debug-jmr6w-769506330, panicking"
time="2023-09-19T13:12:03.797Z" level=warning msg="Coudn't obtain child for simple-debug-jmr6w-786283949, panicking"
time="2023-09-19T13:12:03.814Z" level=info msg="Workflow update successful" namespace=wkf-support phase=Running resourceVersion=87741887 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.799Z" level=info msg="Processing workflow" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.800Z" level=info msg="Task-result reconciliation" namespace=wkf-support numObjs=0 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.800Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.800Z" level=info msg="node changed" namespace=wkf-support new.message= new.phase=Succeeded new.progress=0/1 nodeID=simple-debug-jmr6w-752728711 old.message= old.phase=Pending old.progress=0/1 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.800Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.800Z" level=info msg="node unchanged" namespace=wkf-support nodeID=simple-debug-jmr6w-2463062648 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.801Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:12:13.801Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-215844903, taskName A"
time="2023-09-19T13:12:13.802Z" level=info msg="TaskGroup node simple-debug-jmr6w-215844903 initialized Running (message: )" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.802Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-2560453120, taskName A(0:A1)"
time="2023-09-19T13:12:13.802Z" level=info msg="All of node simple-debug-jmr6w.A(0:A1) dependencies [a] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.802Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.802Z" level=info msg="Pod node simple-debug-jmr6w-2560453120 initialized Pending" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.848Z" level=info msg="Created pod: simple-debug-jmr6w.A(0:A1) (simple-debug-jmr6w-exec-command-2560453120)" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.848Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-2302950176, taskName A(1:A2)"
time="2023-09-19T13:12:13.848Z" level=info msg="All of node simple-debug-jmr6w.A(1:A2) dependencies [a] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.848Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.849Z" level=info msg="template (node simple-debug-jmr6w) active children parallelism exceeded 1" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.849Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:13.849Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:13.849Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:13.849Z" level=info msg="All of node simple-debug-jmr6w.b dependencies [date] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.850Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.850Z" level=info msg="template (node simple-debug-jmr6w) active children parallelism exceeded 1" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.850Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-769506330, taskName b"
time="2023-09-19T13:12:13.850Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:12:13.850Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:12:13.850Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:12:13.850Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:13.850Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:13.850Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:13.850Z" level=info msg="All of node simple-debug-jmr6w.c dependencies [date] completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=info msg="template (node simple-debug-jmr6w) active children parallelism exceeded 1" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:12:13.851Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-786283949, taskName c"
time="2023-09-19T13:12:13.851Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:12:13.851Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-249400141, taskName C"
time="2023-09-19T13:12:13.851Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:12:13.851Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-3926629392, taskName showID"
time="2023-09-19T13:12:13.851Z" level=warning msg="was unable to obtain the node for simple-debug-jmr6w-1913145987, taskName printMessages"
time="2023-09-19T13:12:13.851Z" level=error msg="Mark error node" error="key was not found for simple-debug-jmr6w-769506330" namespace=wkf-support nodeName=simple-debug-jmr6w workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=info msg="node simple-debug-jmr6w phase Running -> Error" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=info msg="node simple-debug-jmr6w message: key was not found for simple-debug-jmr6w-769506330" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=info msg="node simple-debug-jmr6w finished: 2023-09-19 13:12:13.851614635 +0000 UTC" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=error msg="error in entry template execution" error="key was not found for simple-debug-jmr6w-769506330" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=warning msg="Non-transient error: key was not found for simple-debug-jmr6w-769506330"
time="2023-09-19T13:12:13.851Z" level=info msg="Updated phase Running -> Error" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=info msg="Updated message  -> error in entry template execution: key was not found for simple-debug-jmr6w-769506330" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=info msg="Marking workflow completed" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.851Z" level=info msg="Marking workflow as pending archiving" namespace=wkf-support workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.852Z" level=warning msg="was unable to obtain node for simple-debug-jmr6w-2302950176"
time="2023-09-19T13:12:13.852Z" level=warning msg="was unable to obtain node for simple-debug-jmr6w-769506330"
time="2023-09-19T13:12:13.852Z" level=warning msg="was unable to obtain node for simple-debug-jmr6w-786283949"
time="2023-09-19T13:12:13.852Z" level=warning msg="Coudn't obtain child for simple-debug-jmr6w-2302950176, panicking"
time="2023-09-19T13:12:13.852Z" level=warning msg="Coudn't obtain child for simple-debug-jmr6w-769506330, panicking"
time="2023-09-19T13:12:13.852Z" level=warning msg="Coudn't obtain child for simple-debug-jmr6w-786283949, panicking"
time="2023-09-19T13:12:13.852Z" level=warning msg="Coudn't obtain child for simple-debug-jmr6w-2302950176, panicking"
time="2023-09-19T13:12:13.858Z" level=info msg="cleaning up pod" action=deletePod key=wkf-support/simple-debug-jmr6w-1340600742-agent/deletePod
time="2023-09-19T13:12:13.881Z" level=info msg="Workflow update successful" namespace=wkf-support phase=Error resourceVersion=87741987 workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.908Z" level=info msg="archiving workflow" namespace=wkf-support uid=f58f2abd-2a07-42da-aa3f-02aade4bba1e workflow=simple-debug-jmr6w
time="2023-09-19T13:12:13.992Z" level=info msg="Queueing Error workflow wkf-support/simple-debug-jmr6w for delete in 168h0m0s due to TTL"
time="2023-09-19T13:12:18.908Z" level=info msg="cleaning up pod" action=deletePod key=wkf-support/simple-debug-jmr6w-exec-command-2463062648/deletePod
time="2023-09-19T13:12:18.908Z" level=info msg="cleaning up pod" action=deletePod key=wkf-support/simple-debug-jmr6w-exec-command-752728711/deletePod
time="2023-09-19T13:13:33.081Z" level=info msg="Queueing Error workflow wkf-support/simple-debug-jmr6w for delete in 167h58m40s due to TTL"

Logs from in your workflow's wait container

X-Amz-Request-Id: 5A4WEQW1AY8RTDK1
X-Amz-Server-Side-Encryption: AES256
X-Amz-Version-Id: ygRc3_5O6CNiJx8.ZiKNz2thd16fCWpH
---------END-HTTP---------
time="2023-09-19T13:12:18.679Z" level=info msg="Save artifact" artifactName=main-logs duration=81.739167ms error="<nil>" key=artifacts/wkf-support/2023/09/19/simple-debug-jmr6w/simple-debug-jmr6w-exec-command-2560453120/main.log
time="2023-09-19T13:12:18.680Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2023-09-19T13:12:18.680Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2023-09-19T13:12:18.691Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:wkf-support:default\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"wkf-support\""
time="2023-09-19T13:12:18.733Z" level=info msg="Alloc=7938 TotalAlloc=17813 Sys=27517 NumGC=5 Goroutines=10"
time="2023-09-19T13:12:18.733Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
@terrytangyuan
Copy link
Member

Might be related to changes in node status. cc @isubasinghe

@agilgur5 agilgur5 added the area/controller Controller issues, panics label Sep 19, 2023
@isubasinghe
Copy link
Member

The error message itself is certainly from my changes, the question is why was the controller looking for a key that didn't exist?
Looking into this, I expect this to be pain.

@isubasinghe
Copy link
Member

isubasinghe commented Sep 20, 2023

Found what is going on, sigh this is not fun.
Impossible to determine what "correct" behaviour is.

@isubasinghe
Copy link
Member

The previous versions worked because of this: https://github.com/argoproj/argo-workflows/blob/release-3.4.9/workflow/controller/dag.go#L149

node will be default initialised and as a result the !Fullfilled() condition is true.

@terrytangyuan
Copy link
Member

terrytangyuan commented Sep 20, 2023

Sounds like we need to cherry-pick to v3.4?

@terrytangyuan
Copy link
Member

Created an issue to track #11851

@agilgur5 agilgur5 added the type/regression Regression from previous behavior (a specific type of bug) label Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants