Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes provider does not notice missing pods removed by anything other than itself #705

Open
benclifford opened this issue Feb 28, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@benclifford
Copy link
Contributor

Describe the bug
If I kill a worker pod, then
when I launch a subsequent task, hoping that the system will start up a new worker pod, instead my task progresses as far as Task is pending due to waiting-for-nodes and a new worker pod is not launched. This looks like its because the fork of the kubernetes provider in funcx does not check kubernetes for status, and continues to claim that worker pod exists -- this was fixed in the fork of kubernetes provider in parsl in early 2021 - see Parsl/parsl#1740.
I thought I'd already opened a funcx github issue on this but I can't find it.

Restarting the endpoint clears away the list of disappeared-pods.

This 2nd issue is a little bit disguised by scaling: once I have blocked the first missing container with sufficent hung tasks, the end point scales out a new pod to take on any excess work - which then succeeds to execute any new work. So a user experiencing this who accepts that "often funcx doesn't run very well, i should just keep retrying and not report a problem" will trigger that effect without reporting a problem

To Reproduce
Delete a worker pod

Expected behavior
Something more like the parsl fork of the kubernetes provider, Parsl/parsl#1740

Environment
my kubernetes dev environment, main branches as of 2022-02-28

@benclifford benclifford added the bug Something isn't working label Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant