Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing covalent start/stop time #1933

Open
kessler-frost opened this issue Feb 13, 2024 · 0 comments
Open

Optimizing covalent start/stop time #1933

kessler-frost opened this issue Feb 13, 2024 · 0 comments
Assignees

Comments

@kessler-frost
Copy link
Member

Average time taken to start: ~8 seconds
Average time taken to stop the server when there is at least 1 dispatch done: ~30 seconds

The start time mostly taken up by the verification of whether the server is ready to accept dispatches, that's why it is a kind of acceptable. But the stop time taken is actually a lot and we should try to reduce it. The majority of the time when stopping the server is actually taken up by the _terminate_child_processes function (can be found here.

Currently we are sending the SIGINT signal to the leader process and then shutting down its children and we know that this is working fine albeit slow. But as soon as I tried to use other methods of trying to terminate the process, such as the terminate and kill commands made available by psutils, none of them worked and the command got stuck in waiting forever.

We need to look further into fixing it by using the Dask APIs to possibly stop the cluster of workers instead of shutting down their processes directly.

It would also be better if we have tell the user what stage exactly is being loaded when starting/stopping the server and be more verbose.

@kessler-frost kessler-frost self-assigned this Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant