Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At scale some AWs to do not enter in complete state #657

Open
asm582 opened this issue Oct 10, 2023 · 0 comments
Open

At scale some AWs to do not enter in complete state #657

asm582 opened this issue Oct 10, 2023 · 0 comments

Comments

@asm582
Copy link
Member

asm582 commented Oct 10, 2023

Describe the Bug

At scale, some AWs do not enter into a complete state due to the fact that the informer and etcd do not agree.

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

Codeflare SDK:
MCAD:

Steps to Reproduce the Bug

Fire 1K AWs with very short jobs (10 seconds) and wait for completion of all 1K AWs

What Have You Already Tried to Debug the Issue?

I have run scale tests to reproduce the issue

Expected Behavior

All AWs should be completed.

Screenshots, Console Output, Logs, etc.

NA

Affected Releases

Current 1.35.0 release and main branch

Additional Context

NA

Add any other information you think might be useful here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant