No logs accessible in the UI on workflows that went from Pending to Running #10178

fdebuire · 2022-12-05T09:30:28Z

Pre-requisites

I have double-checked my configuration
I can confirm the issues exists when I tested with :latest
I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Argo Workflows controller is configured with parallelism: 10.

Workflows that are created in pending, because too many workflows are already running, will not have the annotation workflows.argoproj.io/pod-name-format: v2 associated to them, therefore logs will not be able to be viewed using the UI.

It can be reproduced using the attached workflow and the command:

for i in {1..20}; do kubectl create -f test.yaml; done

It seems to happen because the annotation is only set here:

argo-workflows/workflow/controller/operator.go

Line 271 in 2eb871b

setWfPodNamesAnnotation(woc.wf)

but not here:

argo-workflows/workflow/controller/operator.go

Line 251 in 2eb871b

if phase == wfv1.WorkflowUnknown {

Version

latest

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test-logs-
spec:
  entrypoint: test-dag
  templates:
  - name: echo
    container:
      image: alpine:3.7
      command: [echo, "OK!"]
  - dag:
      tasks:
      - name: test
        template: test
    name: test-dag
  - dag:
      tasks:
      - name: 'echo-1'
        template: echo
      - depends: echo-1
        name: 'echo-2'
        template: echo
    name: test

Logs from the workflow controller

time="2022-12-05T09:14:17.554Z" level=info msg="Workflow processing has been postponed due to max parallelism limit" key=cloud-workflows/test-logs-54tz6
time="2022-12-05T09:14:17.554Z" level=info msg="Updated phase  -> Pending" namespace=cloud-workflows workflow=test-logs-54tz6
time="2022-12-05T09:14:17.554Z" level=info msg="Updated message  -> Workflow processing has been postponed because too many workflows are already running" namespace=cloud-workflows workflow=test-log>
time="2022-12-05T09:14:17.555Z" level=info msg="Workflow to be dehydrated" Workflow Size=652
time="2022-12-05T09:14:17.653Z" level=info msg="Update workflows 200"
time="2022-12-05T09:14:17.654Z" level=info msg="Workflow update successful" namespace=cloud-workflows phase=Pending resourceVersion=276945745 workflow=test-logs-54tz6
...
time="2022-12-05T09:14:58.332Z" level=info msg="Processing workflow" namespace=cloud-workflows workflow=test-logs-54tz6
time="2022-12-05T09:14:58.340Z" level=info msg="Task-result reconciliation" namespace=cloud-workflows numObjs=0 workflow=test-logs-54tz6
time="2022-12-05T09:14:58.340Z" level=info msg="Updated phase Pending -> Running" namespace=cloud-workflows workflow=test-logs-54tz6

Logs from in your workflow's wait container

N/A

The text was updated successfully, but these errors were encountered:

sarabala1979 · 2022-12-08T18:34:12Z

@fdebuire it is expected behavior. Even workflow is in a running state semaphore/mutax lock is not available to execute the step.
Workflow processing has been postponed due to max parallelism limit" key=cloud-workflows/test-logs-54tz6

fdebuire · 2022-12-09T13:41:23Z

I meant even after all the workflows are completed, logs are still not accessible from the UI for pods of workflows that went through the phase Unknown -> Pending -> Running, it's fine for workflows which went Unknown -> Running.

sarabala1979 · 2022-12-15T18:11:21Z

@fdebuire are you able to access the pod's logs using kubectl logs <pods name? Are you configuring podGC on your workflow?

fdebuire · 2022-12-17T15:00:34Z

@sarabala1979 yes the pod's logs are accessible using the kubectl cli, it's only in the Argo Workflows UI that they are not accessible. No podGC is configured.
I can reproduce the issue every time using the workflow I pasted in the original post.
Also on a workflow where the issue is happening, if I edit it and add the workflows.argoproj.io/pod-name-format: v2 annotation then the logs are accessible through the UI.

agilgur5 · 2024-04-22T06:36:08Z

So the default case of this should have been fixed by #11016. It will be broken if you're using POD_NAMES=v1 though.

It seems to happen because the annotation is only set here:

This analysis makes sense to me and seems accurate; the other location seems to have been indeed missed per #6982 (comment). I also updated the code links to use permalinks to a commit hash from the time of the initial comment.
Those look like the only two places that will transition a Workflow from the Unknown phase. Problematically, it seems like more than just the annotation that isn't set, a PDB also isn't created.

The lock transition logic originates from #6356 which has a very good reason. I'm a little hesitant to completely change it though as this part of the code is latency sensitive -- it shouldn't be doing much if the Workflow doesn't have a lock 😕
Will need to think about it a bit more. The annotation shouldn't add much of any time, but creating the PDB would. Moving that logic to Pending -> Running phase isn't that straightforward either as a I believe a Running Workflow can be kicked back to Pending 😕

agilgur5 · 2024-04-22T17:22:05Z

the other location seems to have been indeed missed per #6982 (comment).

The annotation shouldn't add much of any time

I wrote up a fix for the annotation in #12965 as that's the simpler case.

Problematically, it seems like more than just the annotation that isn't set, a PDB also isn't created.

I'm a little hesitant to completely change it though as this part of the code is latency sensitive -- it shouldn't be doing much if the Workflow doesn't have a lock 😕
Will need to think about it a bit more.

I don't quite have an optimal solution for this yet so I filed an issue to document it at least in #12966.
It was also an incidental finding I made while analyzing this issue and not part of the original issue.

fdebuire added the type/bug label Dec 5, 2022

This comment has been minimized.

Sign in to view

stale bot added the problem/stale This has not had a response in some time label Dec 31, 2022

caelan-io added P3 Low priority and removed problem/stale This has not had a response in some time labels Feb 23, 2023

This comment has been minimized.

Sign in to view

stale bot added the problem/stale This has not had a response in some time label Mar 25, 2023

terrytangyuan added the area/ui label Apr 4, 2023

stale bot removed the problem/stale This has not had a response in some time label Apr 4, 2023

This comment has been minimized.

Sign in to view

agilgur5 added area/controller Controller issues, panics solution/suggested A solution to the bug has been suggested. Someone needs to implement it. labels Apr 22, 2024

agilgur5 added the area/synchronization `parallelism` configurations and other synchronization label Apr 22, 2024

agilgur5 mentioned this issue Apr 22, 2024

PDB not created when Workflow waiting on lock #12966

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No logs accessible in the UI on workflows that went from Pending to Running #10178

No logs accessible in the UI on workflows that went from Pending to Running #10178

fdebuire commented Dec 5, 2022 •

edited by agilgur5

sarabala1979 commented Dec 8, 2022

fdebuire commented Dec 9, 2022

sarabala1979 commented Dec 15, 2022

fdebuire commented Dec 17, 2022

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

agilgur5 commented Apr 22, 2024 •

edited

agilgur5 commented Apr 22, 2024

No logs accessible in the UI on workflows that went from Pending to Running #10178

No logs accessible in the UI on workflows that went from Pending to Running #10178

Comments

fdebuire commented Dec 5, 2022 • edited by agilgur5

Pre-requisites

What happened/what you expected to happen?

Version

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

Logs from the workflow controller

Logs from in your workflow's wait container

sarabala1979 commented Dec 8, 2022

fdebuire commented Dec 9, 2022

sarabala1979 commented Dec 15, 2022

fdebuire commented Dec 17, 2022

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

agilgur5 commented Apr 22, 2024 • edited

agilgur5 commented Apr 22, 2024

fdebuire commented Dec 5, 2022 •

edited by agilgur5

agilgur5 commented Apr 22, 2024 •

edited