Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better reporting when endpoint fails #579

Open
yadudoc opened this issue Aug 25, 2021 · 3 comments
Open

Better reporting when endpoint fails #579

yadudoc opened this issue Aug 25, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@yadudoc
Copy link
Collaborator

yadudoc commented Aug 25, 2021

Is your feature request related to a problem? Please describe.

This is from a debugging conversation with @ravescovi. When an endpoint fails with a ZMQ error, the endpoint appears to start, a series of log messages announce connection steps which seem to indicate that the endpoint is starting including funcx-endpoint list which indicates that the endpoint started only for it to fail silently later. The delay to failure is a problem, and the fact that the funcx-endpoint list only says disconnected rather than failed is a problem.

Describe the solution you'd like

Ideally the endpoint fails right away, however this might be difficult since the failure happens in the endpoint interchange which is a daemonized process. The next best option would be to have funcx-endpoint list be more descriptive with what failed.

Describe alternatives you've considered

This failure message pops up in the interchange.stderr and isn't reported at the end of the the EndpointInterchange.log. Having this error go the EndpointInterchange.log would have been ideal, one option would be squash the EndpointInterchange.log, interchange.stderr and interchange.stdout all into one interchange.log. Having three places to check is pretty bad.

Additional context
Following the instructions in #393 fixed the ZMQ issue.

@yadudoc yadudoc added the enhancement New feature or request label Aug 25, 2021
@yadudoc yadudoc self-assigned this Aug 25, 2021
@BenGalewsky
Copy link
Contributor

@joshbryan-globus - I assume that this should be in Clubhouse, yes?

@yadudoc
Copy link
Collaborator Author

yadudoc commented Aug 25, 2021

Now that the endpoints are starting properly, Raf has got endpoints on Theta and Cooley and will report here if he sees any issue with them disconnecting.

@joshbryan-globus
Copy link
Contributor

@joshbryan-globus - I assume that this should be in Clubhouse, yes?

For externally reported and tracked bugs for open source components, we still need to keep them in GitHub. However, it would be good to have a clubhouse issue to track on the board. Or if this relates to other work already in Clubhouse, linking this issue to the CH issue would be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants