Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offloading large workflows raises db syntax errors that prevent the offloading #12946

Open
3 of 4 tasks
Yarin-Shitrit opened this issue Apr 16, 2024 · 4 comments
Open
3 of 4 tasks
Labels
area/server problem/more information needed Not enough information has been provide to diagnose this issue. problem/stale This has not had a response in some time type/bug

Comments

@Yarin-Shitrit
Copy link

Yarin-Shitrit commented Apr 16, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

I have tried to enable nodeStatusOffload for the workflow-controller, and configured the postgresql under the persistence section in the workflow-controller-configuration.
The connection is successful, as there're migration logs in the controller.
But when it reaches the offloading part, it crashes with this error:

workflow is longer than maximum allowed size. compressed size 1053022 > maxSize 1048576. Tried to offload but encountered an error: pq: syntax error at or near '(' 

I also tried doing this with mysql instead of postgresql but got another syntax error:

workflow is longer than maximum allowed size. compressed size 1053022 > maxSize 1048576. Tried to offload but encountered an error: ERROR 1064: You have an error in your SQL syntax; check the manual that corresponds to your MYSQL server version for the right syntax to use near '('clustername', 'namespace', 'uid', 'nodes', 'version') VALUES (?' at line 2.

I am trying to make my Argo Server run very large workflows, and to do so I generated a sample DAG with demo steps that simply print, and tried to run 700 of those without any dependency between the tasks.

Version

v3.4.6

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

I simply ran a workflow that make some prints, not anything complex.
And loaded my DAG with 700 steps like this.

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@agilgur5
Copy link
Member

  • I can confirm the issue exists when I tested with :latest

v3.4.6

You're on a pretty old version, so I would suggest trying with :latest or 3.4.16 at least.
Please fill out the issue template accurately, it asks for :latest usage very intentionally.

@agilgur5
Copy link
Member

Looking at the diff v3.4.6..v3.4.16, #10887 in particular seems related (although the error is different)

@agilgur5 agilgur5 added the problem/more information needed Not enough information has been provide to diagnose this issue. label Apr 19, 2024
@Yarin-Shitrit
Copy link
Author

I’ll be able to test the “:latest” by Sunday but what other information can I provide you with?

Copy link
Contributor

github-actions bot commented May 5, 2024

This issue has been automatically marked as stale because it has not had recent activity and needs more information. It will be closed if no further activity occurs.

@github-actions github-actions bot added the problem/stale This has not had a response in some time label May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/server problem/more information needed Not enough information has been provide to diagnose this issue. problem/stale This has not had a response in some time type/bug
Projects
None yet
Development

No branches or pull requests

2 participants