Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In run_process, what should happen if the deliver_cancel function raises an exception? #1532

Open
njsmith opened this issue May 18, 2020 · 7 comments · May be fixed by #1555
Open

In run_process, what should happen if the deliver_cancel function raises an exception? #1532

njsmith opened this issue May 18, 2020 · 7 comments · May be fixed by #1555

Comments

@njsmith
Copy link
Member

njsmith commented May 18, 2020

What should happen if deliver_cancel raises an exception? In the current implementation, the shielded cancel scope will prevent it from propagating until the process exits, which might take a while if the crash occurred before signaling the process in any way. Maybe on exception from a user-specified deliver_cancel we should call the default deliver_cancel to kill the process? Or just kill() since we don't care much about polish at that point, only about letting the user know what they need to fix.

Originally posted by @oremanj in #1525

@njsmith njsmith changed the title What should happen if deliver_cancel raises an exception? In run_process, what should happen if the deliver_cancel function raises an exception? May 19, 2020
@njsmith
Copy link
Member Author

njsmith commented May 19, 2020

Yeah, this is tricky. On the one hand, you kind of have to report the failure, b/c "program hangs silently" is terrible UX. On the other hand, you kind of can't report the failure, b/c run_process hasn't finished yet.

That's why I ended up putting warnings.warn calls in the default deliver_cancel – it gives the user an immediate heads-up, even though we can't propagate the error properly.

Maybe we should pull the try: ... except: warnings.warn(...) out of the default deliver_cancel, and put it into the code that calls deliver_cancel instead?

I guess for exceptions out of arbitrary user code, it would be nice to include the full traceback. That's an awkward fit for the warnings machinery though.

In a few other places where we have this kind of problem (instrument failures, EMFILE in serve_listeners), we use logging to dump the warning onto the console, with the idea being that by default the user sees it, and at least its hookable if you want to do something else in production. Maybe we should do the same here? try: await deliver_cancel(proc) except: L = logging.getLogger("trio.run_process"); L.exception(...); raise?

@oremanj
Copy link
Member

oremanj commented May 19, 2020

Logging the exception, and still waiting for the process to exit, seems like a good compromise to me. The log message should explicitly say that it's still waiting for the process to exit, so that the reason for the apparent deadlock is as obvious as we can make it.

@guilledk
Copy link

guilledk commented May 21, 2020

Hey guys so I'm gonna take a crack at this issue.
If I got this right the idea would be a change like, moving the try: .. except to trio._subprocess.run_process.open_process:

try:
    await proc.wait()
except trio.Cancelled:
    with trio.CancelScope(shield=True):
        killer_cscope = trio.CancelScope(shield=True)

        async def killer():
            with killer_cscope:
                try:  # here is the new try except
                    await deliver_cancel(proc)
                except OSError as exc:
                    warnings.warn(
                        RuntimeWarning(
                            f"tried to kill process {proc!r}, but failed with: {exc!r}"
                        )
                    )

        nursery.start_soon(killer)
        await proc.wait()
        killer_cscope.cancel()
        raise

Also what is a good way to trigger an OSError to test this?, just closing the process manually while trio thinks is open will probably work I think, but maybe there is a cleaner way.

@njsmith
Copy link
Member Author

njsmith commented May 21, 2020

If I got this right the idea would be a change like, moving the try: .. except to trio._subprocess.run_process.open_process:

You mean trio._subprocess.run_process.killer, right?

Also what is a good way to trigger an OSError to test this?

You can also just do:

async def custom_deliver_cancel(proc):
    raise OSError("whoops!")

Also:

  • You'll want to catch BaseException and then re-raise
  • We'll want to switch to using logging instead of warnings. For an example of what this might look like, check out trio/_highlevel_server_listeners.py and how LOGGER is used there.

@guilledk
Copy link

Sorry can you expand on the BaseException point?

@njsmith
Copy link
Member Author

njsmith commented May 23, 2020

@guilledk In your comment, you had try/except OSError. For the original version where we were just catching errors from the default deliver_cancel, that was fine, because we knew OSError was the only thing that could be raised. But now we'll be catching errors from user code, so we want to instead do try/except BaseException. And then afterwards, we want to re-raise the exception.

@guilledk
Copy link

Oh ok got it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants