Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No logs accessible in the UI on workflows that went from Pending to Running #10178

Open
2 of 3 tasks
fdebuire opened this issue Dec 5, 2022 · 9 comments · May be fixed by #12965
Open
2 of 3 tasks

No logs accessible in the UI on workflows that went from Pending to Running #10178

fdebuire opened this issue Dec 5, 2022 · 9 comments · May be fixed by #12965
Labels
area/controller Controller issues, panics area/synchronization `parallelism` configurations and other synchronization area/ui P3 Low priority solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/bug

Comments

@fdebuire
Copy link

fdebuire commented Dec 5, 2022

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Argo Workflows controller is configured with parallelism: 10.

Workflows that are created in pending, because too many workflows are already running, will not have the annotation workflows.argoproj.io/pod-name-format: v2 associated to them, therefore logs will not be able to be viewed using the UI.

It can be reproduced using the attached workflow and the command:

for i in {1..20}; do kubectl create -f test.yaml; done

It seems to happen because the annotation is only set here:

setWfPodNamesAnnotation(woc.wf)

but not here:

if phase == wfv1.WorkflowUnknown {

Version

latest

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test-logs-
spec:
  entrypoint: test-dag
  templates:
  - name: echo
    container:
      image: alpine:3.7
      command: [echo, "OK!"]
  - dag:
      tasks:
      - name: test
        template: test
    name: test-dag
  - dag:
      tasks:
      - name: 'echo-1'
        template: echo
      - depends: echo-1
        name: 'echo-2'
        template: echo
    name: test

Logs from the workflow controller

time="2022-12-05T09:14:17.554Z" level=info msg="Workflow processing has been postponed due to max parallelism limit" key=cloud-workflows/test-logs-54tz6
time="2022-12-05T09:14:17.554Z" level=info msg="Updated phase  -> Pending" namespace=cloud-workflows workflow=test-logs-54tz6
time="2022-12-05T09:14:17.554Z" level=info msg="Updated message  -> Workflow processing has been postponed because too many workflows are already running" namespace=cloud-workflows workflow=test-log>
time="2022-12-05T09:14:17.555Z" level=info msg="Workflow to be dehydrated" Workflow Size=652
time="2022-12-05T09:14:17.653Z" level=info msg="Update workflows 200"
time="2022-12-05T09:14:17.654Z" level=info msg="Workflow update successful" namespace=cloud-workflows phase=Pending resourceVersion=276945745 workflow=test-logs-54tz6
...
time="2022-12-05T09:14:58.332Z" level=info msg="Processing workflow" namespace=cloud-workflows workflow=test-logs-54tz6
time="2022-12-05T09:14:58.340Z" level=info msg="Task-result reconciliation" namespace=cloud-workflows numObjs=0 workflow=test-logs-54tz6
time="2022-12-05T09:14:58.340Z" level=info msg="Updated phase Pending -> Running" namespace=cloud-workflows workflow=test-logs-54tz6

Logs from in your workflow's wait container

N/A
@sarabala1979
Copy link
Member

@fdebuire it is expected behavior. Even workflow is in a running state semaphore/mutax lock is not available to execute the step.
Workflow processing has been postponed due to max parallelism limit" key=cloud-workflows/test-logs-54tz6

@fdebuire
Copy link
Author

fdebuire commented Dec 9, 2022

I meant even after all the workflows are completed, logs are still not accessible from the UI for pods of workflows that went through the phase Unknown -> Pending -> Running, it's fine for workflows which went Unknown -> Running.

@sarabala1979
Copy link
Member

@fdebuire are you able to access the pod's logs using kubectl logs <pods name? Are you configuring podGC on your workflow?

@fdebuire
Copy link
Author

@sarabala1979 yes the pod's logs are accessible using the kubectl cli, it's only in the Argo Workflows UI that they are not accessible. No podGC is configured.
I can reproduce the issue every time using the workflow I pasted in the original post.
Also on a workflow where the issue is happening, if I edit it and add the workflows.argoproj.io/pod-name-format: v2 annotation then the logs are accessible through the UI.

@stale

This comment has been minimized.

@stale stale bot added the problem/stale This has not had a response in some time label Dec 31, 2022
@caelan-io caelan-io added P3 Low priority and removed problem/stale This has not had a response in some time labels Feb 23, 2023
@stale

This comment has been minimized.

@stale stale bot added the problem/stale This has not had a response in some time label Mar 25, 2023
@stale stale bot removed the problem/stale This has not had a response in some time label Apr 4, 2023
@stale

This comment has been minimized.

@agilgur5 agilgur5 added area/controller Controller issues, panics solution/suggested A solution to the bug has been suggested. Someone needs to implement it. labels Apr 22, 2024
@agilgur5
Copy link
Member

agilgur5 commented Apr 22, 2024

So the default case of this should have been fixed by #11016. It will be broken if you're using POD_NAMES=v1 though.

It seems to happen because the annotation is only set here:

This analysis makes sense to me and seems accurate; the other location seems to have been indeed missed per #6982 (comment). I also updated the code links to use permalinks to a commit hash from the time of the initial comment.
Those look like the only two places that will transition a Workflow from the Unknown phase. Problematically, it seems like more than just the annotation that isn't set, a PDB also isn't created.

The lock transition logic originates from #6356 which has a very good reason. I'm a little hesitant to completely change it though as this part of the code is latency sensitive -- it shouldn't be doing much if the Workflow doesn't have a lock 😕
Will need to think about it a bit more. The annotation shouldn't add much of any time, but creating the PDB would. Moving that logic to Pending -> Running phase isn't that straightforward either as a I believe a Running Workflow can be kicked back to Pending 😕

@agilgur5
Copy link
Member

the other location seems to have been indeed missed per #6982 (comment).

The annotation shouldn't add much of any time

I wrote up a fix for the annotation in #12965 as that's the simpler case.

Problematically, it seems like more than just the annotation that isn't set, a PDB also isn't created.

I'm a little hesitant to completely change it though as this part of the code is latency sensitive -- it shouldn't be doing much if the Workflow doesn't have a lock 😕
Will need to think about it a bit more.

I don't quite have an optimal solution for this yet so I filed an issue to document it at least in #12966.
It was also an incidental finding I made while analyzing this issue and not part of the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/synchronization `parallelism` configurations and other synchronization area/ui P3 Low priority solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/bug
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

5 participants