Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using expressions with retries and inputs.parameters in podSpecPatch resources requests #12941

Open
3 of 4 tasks
yonirab opened this issue Apr 16, 2024 · 2 comments
Assignees
Labels
area/retryStrategy Template-level retryStrategy area/templating Templating with `{{...}}` P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug

Comments

@yonirab
Copy link
Contributor

yonirab commented Apr 16, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

This is a follow on issue to #10362.
The solution there only seems to work when resource parameters are referenced in the podSpecPatch via workflow.parameters.
However, in order to be able to modify resource requests for retries of steps, we need to be able to reference resource parameters in the podSpecPatch via inputs.parameters, in conjunction with expressions applied to {{retries}}.
That seems to fail with Error applying PodSpecPatch.

In the workflow_controller logs I see errors like the following:

Non-transient error: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'""

Not sure what might be causing that.

Version

3.5.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: wf-
spec:
  entrypoint: wf
  templates:
  - name: wf
    steps:

    - - name: run-task1
        template: run-task
        arguments:
          parameters:
            - name: memreqnum
              value: '25'
            - name: memrequnit
              value: Mi
            - name: message
              value: "hello from run-task1"

      - name: run-task2
        template: run-task
        arguments:
          parameters:
            - name: memreqnum
              value: '300'
            - name: memrequnit
              value: Mi
            - name: message
              value: "hello from run-task2"
 

  - name: run-task
    inputs:
      parameters:
        - name: memreqnum
        - name: memrequnit
        - name: message
    retryStrategy:
      limit: "2"
      retryPolicy: "Always"
      expression: 'lastRetry.status == "Error" or (lastRetry.status == "Failed" and asInt(lastRetry.exitCode) not in [1,2,127])'
    podSpecPatch: |
      containers:
      - name: main
        resources:
          requests:
            memory: "{{=(sprig.int(retries)+1)*sprig.int(inputs.parameters.memreqnum)}}{{inputs.parameters.memrequnit}}"
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

time="2024-04-16T05:57:21.759Z" level=info msg="Processing workflow" Phase= ResourceVersion=1002197419 namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.766Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=wf-vp9xw
time="2024-04-16T05:57:21.766Z" level=info msg="Updated phase  -> Running" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.766Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.766Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.767Z" level=info msg="Retry node wf-vp9xw initialized Running" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.767Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.767Z" level=info msg="Steps node wf-vp9xw-2274503212 initialized Running" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.767Z" level=info msg="StepGroup node wf-vp9xw-3885283846 initialized Running" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.767Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.767Z" level=info msg="Retry node wf-vp9xw-587369068 initialized Running" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.767Z" level=info msg="Pod node wf-vp9xw-3271220583 initialized Pending" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.769Z" level=error msg="Mark error node" error="Error applying PodSpecPatch" namespace=default nodeName="wf-vp9xw(0)[0].run-task1(0)" workflow=wf-vp9xw
time="2024-04-16T05:57:21.769Z" level=info msg="node wf-vp9xw-3271220583 phase Pending -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.769Z" level=info msg="node wf-vp9xw-3271220583 message: Error applying PodSpecPatch" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.769Z" level=info msg="node wf-vp9xw-3271220583 finished: 2024-04-16 05:57:21.769247744 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.769Z" level=info msg="Retry Policy: Always (onFailed: true, onError true)" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.769Z" level=info msg="1 child nodes of wf-vp9xw(0)[0].run-task1 failed. Trying again..." namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.770Z" level=info msg="Pod node wf-vp9xw-654162282 initialized Pending" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.771Z" level=error msg="Mark error node" error="Error applying PodSpecPatch" namespace=default nodeName="wf-vp9xw(0)[0].run-task1(1)" workflow=wf-vp9xw
time="2024-04-16T05:57:21.771Z" level=info msg="node wf-vp9xw-654162282 phase Pending -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.771Z" level=info msg="node wf-vp9xw-654162282 message: Error applying PodSpecPatch" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.771Z" level=info msg="node wf-vp9xw-654162282 finished: 2024-04-16 05:57:21.771351431 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.771Z" level=info msg="Retry Policy: Always (onFailed: true, onError true)" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.771Z" level=info msg="2 child nodes of wf-vp9xw(0)[0].run-task1 failed. Trying again..." namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.772Z" level=info msg="Pod node wf-vp9xw-1056972233 initialized Pending" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=error msg="Mark error node" error="Error applying PodSpecPatch" namespace=default nodeName="wf-vp9xw(0)[0].run-task1(2)" workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="node wf-vp9xw-1056972233 phase Pending -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="node wf-vp9xw-1056972233 message: Error applying PodSpecPatch" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="node wf-vp9xw-1056972233 finished: 2024-04-16 05:57:21.773435805 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="Retry Policy: Always (onFailed: true, onError true)" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="No more retries left. Failing..." namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="node wf-vp9xw-587369068 phase Running -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="node wf-vp9xw-587369068 message: No more retries left" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=info msg="node wf-vp9xw-587369068 finished: 2024-04-16 05:57:21.773862122 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.773Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.774Z" level=info msg="Retry node wf-vp9xw-637701925 initialized Running" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.774Z" level=info msg="Pod node wf-vp9xw-2667399556 initialized Pending" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.775Z" level=error msg="Mark error node" error="Error applying PodSpecPatch" namespace=default nodeName="wf-vp9xw(0)[0].run-task2(0)" workflow=wf-vp9xw
time="2024-04-16T05:57:21.775Z" level=info msg="node wf-vp9xw-2667399556 phase Pending -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.775Z" level=info msg="node wf-vp9xw-2667399556 message: Error applying PodSpecPatch" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.775Z" level=info msg="node wf-vp9xw-2667399556 finished: 2024-04-16 05:57:21.775651548 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.775Z" level=info msg="Retry Policy: Always (onFailed: true, onError true)" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.776Z" level=info msg="1 child nodes of wf-vp9xw(0)[0].run-task2 failed. Trying again..." namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.776Z" level=info msg="Pod node wf-vp9xw-989490561 initialized Pending" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.777Z" level=error msg="Mark error node" error="Error applying PodSpecPatch" namespace=default nodeName="wf-vp9xw(0)[0].run-task2(1)" workflow=wf-vp9xw
time="2024-04-16T05:57:21.777Z" level=info msg="node wf-vp9xw-989490561 phase Pending -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.777Z" level=info msg="node wf-vp9xw-989490561 message: Error applying PodSpecPatch" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.777Z" level=info msg="node wf-vp9xw-989490561 finished: 2024-04-16 05:57:21.777513727 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.777Z" level=info msg="Retry Policy: Always (onFailed: true, onError true)" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.777Z" level=info msg="2 child nodes of wf-vp9xw(0)[0].run-task2 failed. Trying again..." namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.778Z" level=info msg="Pod node wf-vp9xw-1123564418 initialized Pending" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=error msg="Mark error node" error="Error applying PodSpecPatch" namespace=default nodeName="wf-vp9xw(0)[0].run-task2(2)" workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-1123564418 phase Pending -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-1123564418 message: Error applying PodSpecPatch" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-1123564418 finished: 2024-04-16 05:57:21.779335294 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="Retry Policy: Always (onFailed: true, onError true)" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="No more retries left. Failing..." namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-637701925 phase Running -> Error" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-637701925 message: No more retries left" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-637701925 finished: 2024-04-16 05:57:21.779739696 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="Step group node wf-vp9xw-3885283846 deemed failed: child 'wf-vp9xw-587369068' failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-3885283846 phase Running -> Failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-3885283846 message: child 'wf-vp9xw-587369068' failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-3885283846 finished: 2024-04-16 05:57:21.779795413 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="step group wf-vp9xw-3885283846 was unsuccessful: child 'wf-vp9xw-587369068' failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="Outbound nodes of wf-vp9xw-587369068 is [wf-vp9xw-1056972233]" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="Outbound nodes of wf-vp9xw-637701925 is [wf-vp9xw-1123564418]" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="Outbound nodes of wf-vp9xw-2274503212 is [wf-vp9xw-1056972233 wf-vp9xw-1123564418]" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-2274503212 phase Running -> Failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-2274503212 message: child 'wf-vp9xw-587369068' failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.779Z" level=info msg="node wf-vp9xw-2274503212 finished: 2024-04-16 05:57:21.77985872 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="Retry Policy: OnError (onFailed: false, onError true)" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="Node not set to be retried after status: Failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="node wf-vp9xw phase Running -> Failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="node wf-vp9xw message: child 'wf-vp9xw-587369068' failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="node wf-vp9xw finished: 2024-04-16 05:57:21.78010322 +0000 UTC" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg=reconcileAgentPod namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="Updated phase Running -> Failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="Updated message  -> child 'wf-vp9xw-587369068' failed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="Marking workflow completed" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.780Z" level=info msg="Marking workflow as pending archiving" namespace=default workflow=wf-vp9xw
time="2024-04-16T05:57:21.786Z" level=info msg="cleaning up pod" action=deletePod key=default/wf-vp9xw-1340600742-agent/deletePod
time="2024-04-16T05:57:21.797Z" level=info msg="Workflow update successful" namespace=default phase=Failed resourceVersion=1002197420 workflow=wf-vp9xw
time="2024-04-16T05:57:22.073Z" level=info msg="archiving workflow" namespace=default uid=693ecff0-fad6-4774-95d5-1035f1004dfd workflow=wf-vp9xw
time="2024-04-16T05:57:22.123Z" level=info msg="Queueing Failed workflow default/wf-vp9xw for delete in 29m59s due to TTL"

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@yonirab
Copy link
Contributor Author

yonirab commented Apr 16, 2024

@eduardodbr @agilgur5 @EladProject I opened this issue as a follow on to #10362, since that issue is technically Closed.

Any ideas why I am seeing workflow controller logs like the following with the workflow posted here:

Non-transient error: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'""

@yonirab
Copy link
Contributor Author

yonirab commented Apr 16, 2024

Note also that changing the memory request in the workflow above as follows generates a workflow that runs fine:

            memory: "{{retries}}{{inputs.parameters.memreqnum}}{{inputs.parameters.memrequnit}}"

However, attempting to apply arithmetic expressions on {{retries}} in conjunction with {{inputs.parameters}} seems to cause problems (even though arithmetic expressions can be applied on {{retries}} in conjunction with {{workflow.parameters}} as noted by @EladProject in #10362 (comment)).

@eduardodbr eduardodbr self-assigned this Apr 16, 2024
@agilgur5 agilgur5 changed the title Error applying PodSpecPatch when using expressions in conjunction with retries and inputs.parameters in podSpecPatch resources requests Error when using expressions with retries and inputs.parameters in podSpecPatch resources requests Apr 16, 2024
@agilgur5 agilgur5 added area/templating Templating with `{{...}}` P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Apr 16, 2024
@agilgur5 agilgur5 added area/retryStrategy Template-level retryStrategy and removed area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/retryStrategy Template-level retryStrategy area/templating Templating with `{{...}}` P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug
Projects
None yet
Development

No branches or pull requests

3 participants