Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline templates still do not create the pod name correctly #12937

Open
3 of 4 tasks
McKMarcBruchner opened this issue Apr 15, 2024 · 2 comments · May be fixed by #12928
Open
3 of 4 tasks

Inline templates still do not create the pod name correctly #12937

McKMarcBruchner opened this issue Apr 15, 2024 · 2 comments · May be fixed by #12928
Assignees
Labels
area/controller Controller issues, panics P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority solution/suggested A solution to the bug has been suggested. Someone needs to implement it. solution/workaround There's a workaround, might not be great, but exists type/bug

Comments

@McKMarcBruchner
Copy link

McKMarcBruchner commented Apr 15, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

It still looks like the bug from #10912 only that now with version v3.5.5 argo is not even realizing anymore that the pod is already completed.

Screenshot 2024-04-15 at 14 49 22

The pod argo wants to refer to is named fantastic-python-2118328171 as can be seen in the UI. (for the workflow, see the example workflow further down)

It's stuck in pending mode as the pod is not found on the Kubernetes cluster, because when I look at the cluster, I see the pod called fantastic-python--2118328171 which is either running or already done. Since argo apparently uses the wrong name to fetch the pod and its status, the workflow does not proceed. Even if it would continue, the logs would not be visible in the argo UI, see the referenced bug report for it.

You can even see in the logs of the controller that it created the pod using the double - and then it tries to pull the status of the pod using only one -. (see logs further, the first log is being returned when I search for the pod name with double -, the other logs, when I search for the pod name with only one -)

If I got your fix right for #10912 I think you tried to fix a problem with the naming of the task, but I think the issue goes deeper:

As far as I understood it, the pod name should always be something like {workflow-name}-{workflow-id}-{step-name}-{step-id} but when I use the inline option rather than the template option to reference another template, it simply does not add the step-name to the pod name.

Please have a look at it, since I need to use the inline option as I'm creating the workflows automatically, and using the template option would make everything much much harder.

Version

v3.5.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

metadata:
  name: fantastic-python
  namespace: default
spec:
  entrypoint: argosay
  templates:
    - name: argosay
      dag:
        tasks:
          - name: some-task
            inline:
              container:
                name: main
                image: argoproj/argosay:v2
                command:
                  - /argosay
                args:
                  - echo
                  - 'Hello argo'

Logs from the workflow controller

time="2024-04-15T12:44:16.689Z" level=info msg="Created pod: fantastic-python.some-task (fantastic-python--2118328171)" namespace=default workflow=fantastic-python

time="2024-04-15T12:44:16.678Z" level=warning msg="was unable to obtain the node for fantastic-python-2118328171, taskName some-task"
time="2024-04-15T12:44:16.678Z" level=warning msg="was unable to obtain the node for fantastic-python-2118328171, taskName some-task"
time="2024-04-15T12:44:16.678Z" level=info msg="Pod node fantastic-python-2118328171 initialized Pending" namespace=default workflow=fantastic-python
time="2024-04-15T12:44:26.691Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:44:36.695Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:44:46.711Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:44:56.714Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:45:06.717Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:45:16.722Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python

Logs from in your workflow's wait container

time="2024-04-15T12:59:16.660Z" level=info msg="Starting Workflow Executor" version=v3.4.8
time="2024-04-15T12:59:16.661Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-04-15T12:59:16.661Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=fantastic-python--2118328171 template="{\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"main\",\"image\":\"argoproj/argosay:v2\",\"command\":[\"/argosay\"],\"args\":[\"echo\",\"Hello argo\"],\"resources\":{}}}" version="&Version{Version:v3.4.8,BuildDate:2023-05-25T22:21:53Z,GitCommit:9e27baee4b3be78bb662ffa5e3a06f8a6c28fb53,GitTag:v3.4.8,GitTreeState:clean,GoVersion:go1.20.4,Compiler:gc,Platform:linux/arm64,}"
time="2024-04-15T12:59:16.661Z" level=info msg="Starting deadline monitor"
time="2024-04-15T12:59:18.663Z" level=info msg="Main container completed" error="<nil>"
time="2024-04-15T12:59:18.663Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-04-15T12:59:18.663Z" level=info msg="No output parameters"
time="2024-04-15T12:59:18.663Z" level=info msg="No output artifacts"
time="2024-04-15T12:59:18.663Z" level=info msg="Alloc=9863 TotalAlloc=15937 Sys=24429 NumGC=4 Goroutines=7"

@shuangkun shuangkun self-assigned this Apr 15, 2024
@shuangkun
Copy link
Member

looks the same reason with #12895, and will be fixed by #12928

@shuangkun shuangkun added the area/controller Controller issues, panics label Apr 15, 2024
@McKMarcBruchner
Copy link
Author

mmh okay looks like it yes, I was somehow only able to find 2 already closed issues on this, but never found the open one on google. Thank you for the quick response!

@agilgur5 agilgur5 added the P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority label Apr 18, 2024
@agilgur5 agilgur5 added solution/workaround There's a workaround, might not be great, but exists solution/suggested A solution to the bug has been suggested. Someone needs to implement it. labels Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority solution/suggested A solution to the bug has been suggested. Someone needs to implement it. solution/workaround There's a workaround, might not be great, but exists type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants