Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] What to do about cancel scopes / nurseries inside generators? #264

Open
njsmith opened this issue Aug 4, 2017 · 25 comments
Open

Comments

@njsmith
Copy link
Member

njsmith commented Aug 4, 2017

So here's a surprising thing that can happen in trio right now: if you have a generator (or async generator, doesn't matter) that yields inside a cancel scope, then the cancel scope remains in effect until you resume the generator and exit it.

For context managers, this is actually what you want, and we make use of this in lots of places, e.g.:

@contextmanager
def fail_at(deadline):
    with move_on_at(deadline) as scope:
        yield scope
    ...

with fail_at(deadline):
    XX

So here we have: first fail_at runs for a bit, and it starts a cancel scope, and then it yields. And then the XX code runs with the cancel scope in effect, even though failt_at is not on the stack. And then fail_at is resumed, and it exits the cancel scope. A little confusing when you think about the details, but basically this is safe and useful and ultimately does what you'd expect.

However, in general this can cause strange behavior, and also crashes:

import trio

def weird_generator():
    with trio.open_cancel_scope() as cs:
        yield cs

async def main():
    # hold onto a reference to prevent the gc from making this even more confusing:
    gen = weird_generator()
    for cs in gen:
        break
    cs.cancel()
    await trio.sleep(1)

trio.run(main)

Notice that even after we leave the loop and abandon the generator, the cancel scope is still in effect... and then the program crashes because trio detects that the cancel scope stack has become inconsistent (it tries to pop a scope that isn't at the top of the stack).

This all also applies to using nurseries inside async generators, because nurseries have an implicit cancel scope.

There are also some more generic issues with cleaning up (async) generators that call trio APIs after yielding, but that's another issue (specifically, issue #265). The cancel scope leakage causes problems even while the generator is alive, and even if we had PEP 533 then this would still be an issue.

Really the ideal solution here would be to error out if someone tries to yield through a cancel scope, IFF they're not implementing a context manager.

I don't think there's any way to do this though without PEP 521, and even then we'd have to figure out somehow whether this particular generator is implementing a context manager. (I guess some annoying thread-local thing could work -- e.g. make with trio.this_is_a_context_manager: ... which sets some internal flag that disables the PEP 521-based checks.)

I don't currently have any other ideas how to improve this, except to put some prominent warning in the documentation.

@njsmith
Copy link
Member Author

njsmith commented Aug 12, 2017

Some discussion on python-ideas (scroll down to point 9): https://mail.python.org/pipermail/python-ideas/2017-August/046736.html

@njsmith
Copy link
Member Author

njsmith commented Apr 3, 2018

Notes:

@oremanj
Copy link
Member

oremanj commented May 23, 2018

One idea for nurseries-in-generators is to provide an ergonomic way to get a context-managed generator. I wrote an example implementation up at https://gist.github.com/oremanj/772241665edea3ceb3183a4924aecc08 (it was originally intended to be a quick-and-dirty demo, but rather grew a life of its own...) I think something like this would likely be useful in a trio utilities library. It doesn't require any special support from trio or async_generator and is rather large/magical so I'm not sure it deserves to make it into trio itself, unless trio wants to use it.

There are still notable limitations; for example, in the azip() example, if one of the inner slow_counters reaches its deadline and is cancelled, the whole azip() will be cancelled, because we had to enter the slow_counter contexts to get async iterables that could be passed to the azip() context. But if the limitations are acceptable, cancel scopes and nurseries that are attached to a generator in this fashion should be guaranteed to nest correctly / not mess up the cancel stack.

@njsmith
Copy link
Member Author

njsmith commented Sep 6, 2018

Thinking about this a bit more...

PEP 568, if implemented, would make it possible to assign a reasonable semantics to cancel scopes with yield inside them. Basically, the scope would apply to the code that's statically inside the scope, but not to anything outside the generator; if the scope is cancelled/timeout expires while the generator is suspended, then that just means that when the generator is resumed, it'll be in a cancelled state and the next checkpoint will raise Cancelled. (Or if it exits the cancel scope without executing a checkpoint, that's fine too.)

PEP 568 would not make it possible to assign a reasonable semantics to a nursery with a yield inside it – that's intrinsically impossible if we want to keep the guarantee that exceptions in background tasks are propagated promptly, because if a background task crashes while the generator-holding-the-nursery is suspended, what can you do? There's no live stack frame to propagate it into. If we were willing to relax the guarantee to just, exceptions are propagated eventually, and PEP 533 were implemented, then we could allow yield inside nurseries, but neither of those seems likely.

So, maybe we want to we forbid yielding out of nurseries. How can we do that? One idea would be to have PEP 568, use it for cancel scopes, and then for nurseries add an additional feature: the ability to somehow "pin" the Context that's on top of the stack (the one that writes go to), so that if a yield would take us out of that context, then it fails (raises an exception instead of yielding). Then regular cancel scopes would stash their state in the context, and nurseries would additionally pin the context. And inside an @asynccontextmanager function you could still yield out of a nursery, because @asynccontextmanager functions – unlike other async generators – share their context with their caller, so yield there doesn't touch the context stack.

One more thing to consider: what if we decide to disallow yield inside all cancel scopes, not just nurseries? Even the correct semantics for a yield inside a cancel scope are a little confusing; if the timeout expires while the generator is suspended, then nothing happens, until later it's resumed and then boom. Users could conceivably be expecting a more complex semantics, where the timeout clock stops ticking during the yield. Or even if they aren't, then it's pretty weird for the semantics of the generator to depend on how long it was suspended for – that violates encapsulation, and probably isn't very useful.

If that's what we want to do, then we don't need to tie to it PEP 568-style nested contexts. The reason why tying them together is attractive is that PEP 568-style nested contexts need to have a special case to handle @contextmanager and @asynccontextmanager, and the no-yield-in-nursery rules need the exact same special case, so maybe we can combine the special cases. But we can also imagine adding the no-yield-in-nursery feature more directly, and decoupling the whole thing from PEP 568.

I guess what we'd want is: a way to set a flag on a caller's frame that makes it an error to attempt to yield out of that frame (but await is fine). And by "caller" I really mean "walk up the frame stack until we find the first frame that is not an @(async)contextmanager, or quit early if we encounter a frame that's not an (async) generator". On CPython this would actually be ... pretty easy to implement, and efficient. (Except that we might need some tweaks for the bytecode eval loop to distinguish between the different kinds of yield/await.) The PyPy folks might hate it; should check with them. Generators are a bit special AFAIK (you have to materialize their frames for suspension to work), so maybe it's not so bad?

Summary

Our options are:

  • Allow yield in cancel scopes and nurseries
    • Requires: PEP 568, PEP 533, and relaxing the exception guarantees that nurseries currently provide
  • Allow yield in cancel scopes, but not nurseries
    • Requires: PEP 568, plus some kind of "disallow yielding out of the current context" feature
  • Disallow yield in both cancel scopes and nurseries
    • Requires: some kind of "disallow yielding the current frame" extension

PEP 533 is still how Python ought to work, but may not be the right hill to die on...

@njsmith
Copy link
Member Author

njsmith commented Sep 6, 2018

walk up the frame stack until we find the first frame that is not an @(async)contextmanager, or quit early if we encounter a frame that's not an (async) generator

Actually, that's not quite right...

class CM1:
    def __enter__(self):
        self._cm2 = CM2()
        self._cm2.__enter__()

class CM2:
    def __enter__(self):
        disallow_yield()

with CM1():
    yield  # should be an error

So we really would want something like the PEP 568 semantics... like a stack of bools (or rather ints, to allow inc/dec, so disallow_yields can be nested), where the top of the stack tracks the allowed-to-yield state, entering an generator pushes to the stack, leaving pops, the push/pop is disabled on context manager generators, and the stack is automatically saved/restored when we suspend/resume coroutines, just like the exception stack. If we decide we want PEP 568 anyway, this could even be a ContextVar. Though the "pinning" idea has the advantage that copy_context wouldn't copy the disallow-yield state.

Eh, we have other stacks in the interpreter state, like the exception stack, it wouldn't be too weird to add another. And that way of formulating it wouldn't be a problem for PyPy.

@njsmith
Copy link
Member Author

njsmith commented Sep 6, 2018

Example of a case where having the disallow-yield state be stored in the Context would be bad (especially if it's preserved across copy_context): the kluges pytest-trio uses to share a Context between fixtures and the test. See: python-trio/pytest-trio#59

@njsmith
Copy link
Member Author

njsmith commented Sep 6, 2018

Frightening thought: we did a tremendous amount of work in python-trio/pytest-trio#55 to allow pytest-trio fixtures to yield inside nursery blocks. That work in particular is OK, because whatever hack we use to make @contextmanager work can also be applied to @pytest.fixture. (And it'll be a hack, but probably OK; the @pytest.fixture code does have some disturbingly detailed knowledge of why exactly yield is not OK and works around it, so I guess if other libraries started using disallow-yield for other reasons that could be a problem, but probably it's fine.) That's not the frightening thought.

The frightening thought is: well, why did we do all that work? It was because we decided it's not reasonable to expect people to know whether a random async context manager like async with open_websocket actually has a nursery inside it – that should be abstracted away, an internal implementation detail. But...... it can't be, because if you can't use nurseries inside random generators, then you can't use async with open_websocket inside random generators, and that means "does this context manager have a nursery inside it?" is actually a publicly visible feature of your API, and adding a nursery to an existing context manager may be a backwards-compatibility-breaking change :-( :-( :-(

@njsmith
Copy link
Member Author

njsmith commented Sep 6, 2018

A small problem with PEP 568: the way @contextmanager/@asyncontextmanager trigger the special case in the generators they call, is by setting an attribute on the generator.

But in the time since I've written that, I've learned more about all the ways this is an unreliable thing to try to do, in a world where people love their decorators. It's fine of course if @contextmanager directly sees the generator, but it means we'd get weird subtle bugs if someone does:

def noop_wrap(fn):
    def wrapped(*args, **kwargs):
        return fn(*args, **kwargs)
    return wrapped

@contextmanager
@noop_wrap
def agen_fn():
    ...

because @contextmanager ends up writing to an attribute on the wrapped object, not the agen_fn object, so it has no effect.

Instead, the special case should be triggered by an optional argument passed to send/throw/close.

@njsmith
Copy link
Member Author

njsmith commented Sep 6, 2018

Instead of a stack of bools or ints, maybe it should be a stack of str objects describingwhy yield is forbidden, so we can give a good error message and track down bugs. Then instead of having "toggle the top value in the stack", we'd have "push/pop the stack", and entering a generator would (by default) push a sentinel value that indicates yielding is allowed.

@njsmith
Copy link
Member Author

njsmith commented Sep 13, 2018

A few more notes based on further thought:

  • This comment is pretty confused – we'd be setting the attribute on the generator object, not on the function, and wrapping generator objects is both very rare; and if someone did wrap a generator object, putting the argument on send wouldn't necessarily help. Yury likes the attribute-on-generator-object approach better.

  • We don't really need a stack at all, just a thread-local flag that generator.send and friends set on entry and restore on exit.

@vxgmichel
Copy link
Contributor

From what I've read in this comment, it seems like the idea of:

  • allowing the yield statement inside a trio nursery
  • AND allowing it to cancel the current task regardless of its current position in the code

is out of the picture. I was curious about that so I've been experimenting a bit and I came up with this write-up explaining why it might conceptually make sense.

I've tried to find and read most of the related posts in issue #264 and #638 but I might have missed the explanation I was looking for. Anyway, I'm curious to hear what you guys think about it and I hope I'm not missing something crucial.

@njsmith
Copy link
Member Author

njsmith commented Apr 4, 2019

There's an initial proposal for making it possible to disallow yield here: https://discuss.python.org/t/preventing-yield-inside-certain-context-managers/1091

@njsmith
Copy link
Member Author

njsmith commented Apr 4, 2019

@vxgmichel The big issue with that approach for me is that I don't understand it :-). Basically, what you say in the "Limitations" section at the end. In code like this:

async with aitercontext(agen()) as safe_agen:
    async for item in safe_agen:
        ...  # block 1
    ...  # block 2

I find it really hard to reason about what happens with one of the two ... code blocks raises an exception. It's pretty surprising that an exception in one of these two places could be caught by the code in agen! It's also surprising that the code in "block 2" could be cancelled by code in agen.

Maybe it could technically work? But I have no idea how to explain it to beginners.

@vxgmichel
Copy link
Contributor

vxgmichel commented Apr 4, 2019

@njsmith

The big issue with that approach for me is that I don't understand it :-)

I haven't thought about it in month, so it was quite a struggle to get back into it too 😅

I find it really hard to reason about what happens with one of the two ... code blocks raises an exception.
It's pretty surprising that an exception in one of these two places could be caught by the code in agen!

I think block 1 is fine, I would just reason about it the same way we reason about async context manager. This way, it doesn't seem absurd (to me at least) that agen might have to deal with an exception raised by its consumer. It acts like the consumer notifying the producing generator that it should stop, clean up and return. Much like an async context manager, it is not suppose to yield new values after an exception has been raised (or an aclose has been issued).

It's also surprising that the code in "block 2" could be cancelled by code in agen.

Block 2 and block 0 (in between the async with and the async for) are the biggest issue here, they really shouldn't exist at all. The problem is that I couldn't find any trick or syntax to ban those sections without something like PEP 533.

There's an initial proposal for making it possible to disallow yield here

Interesting! That makes me realize that the generators I described in the write-up should be treated as a specific type of async generator. In the proposal there is the following example about async context manager:

@asynccontextmanager
async def open_websocket(...):
    async with open_nursery():
        yield Websocket(...)  # This yield is OK

We could also imagine to allow yield statements for "producers" (I couldn't find a better term):

@asyncproducer
async def read_websocket(...):
    async with open_nursery():
        async for message in Websocket(...):
            yield message  # This yield is OK

Assuming PEP 533 and __aiterclose__ being implemented properly, the producer could be used as follow:

async for message in read_websocket(...):
    print(message)
    break  # or raise
# The producer has been properly shutdown by now

@ghost
Copy link

ghost commented Apr 4, 2019

I ran into part of this issue in a simple sync case, and wondered about handling it in a linter by requiring

try:
    yield
finally:
    ...

pylint-dev/pylint#2832

@pquentin
Copy link
Member

pquentin commented Apr 4, 2019

I thought I'd link here the latest idea from @njsmith to fix this issue in CPython: Preventing yield inside certain context managers

@njsmith
Copy link
Member Author

njsmith commented Apr 4, 2019

@vxgmichel There's a closely related discussion in #638, starting from this comment: #638 (comment)

There we're talking about running generators as a separate task that pushes values into a channel, and the main task iterates over this channel instead of iterating over the agen directly.

I guess if you really think about it carefully, this is... probably equivalent to your suggestion, or mostly-equivalent, in terms of the final behavior? If the background task crashes, this cancels the looping code, or vice-versa, etc. But I find the generator-as-separate-task version a lot easier to think about, because it reuses normal trio concepts like tasks and channels.

@njsmith
Copy link
Member Author

njsmith commented Apr 4, 2019

@apnewberry I don't think that issue is the same as this one, though I could be missing something. The issue here is all about yield inside two specific with blocks, and one of the nice things about with blocks is that you can't forget the finally – if you use the with block at all, then the finally comes along automatically :-)

@vxgmichel
Copy link
Contributor

@njsmith

There's a closely related discussion in #638, starting from this comment: #638 (comment)

Yes, I actually commented on the corresponding gist (that wasn't a very good idea since there's no notification system for gist comments).

I guess if you really think about it carefully, this is... probably equivalent to your suggestion, or mostly-equivalent, in terms of the final behavior?

I would argue that both approaches are useful but not equivalent. The channel-based approach is conceptually similar to go-style generators with the producer and consumer tasks running concurrently, occasionally handshaking over a channel. On the other hand, the native generator approach is running a single task, executing either producer code or consumer code.

In the first case, an unused item might get lost in the channel when the consumer decides it had enough. In the second case, an item is only produced when the consumer asks for it, so no item is lost. This might or might not be important depending on the use case.

@njsmith
Copy link
Member Author

njsmith commented Apr 5, 2019

That's true, even with an unbuffered channel the producer does start on producing the next object as soon as the previous one is consumed.

@njsmith
Copy link
Member Author

njsmith commented Apr 5, 2019

I guess there's no technical reason you couldn't have a channel with a buffer size of "-1", where send doesn't just wait for the sent item to be received, but waits for receive to be called on the item after that.

@smurfix
Copy link
Contributor

smurfix commented Apr 5, 2019

@njsmith That would still produce a possibly-lost first item. IMHO for lock-step non-parallelism you don't need a queue, you need a simple procedure call to the producer.

@njsmith
Copy link
Member Author

njsmith commented Apr 5, 2019

@smurfix Yeah, you also need some way to wait for that first receive call, at a minimum. I guess one approach would be: add a wait_send_might_not_block, and then something like @oremanj's @producer decorator could call it before each time it steps the generator.

Of course, I was kind of hoping that we could get rid of Stream.wait_send_all_might_not_block#371 – and adding Channel.wait_send_might_not_block would be going the other way. But sometimes things don't turn out the way I hope.

This would also have some weird edge cases.

If the consumer cancels their receive, the producer would keep going on calculating the value to send. This makes sense but is different from a classic async generator. (It also means that cancelling a receive is recoverable, while cancelling __anext__ destroys the generator object.)

Weirder: if the producer sets a timeout, it could leak out and cause the harness's call to wait_send_might_not_block to exit early, and start the next iteration early.

All in all it might turn out that it's just not worth going to any heroic lengths to try to make the background-generator style match the inline-generator style. They're not quite the same, but in practice the simple version of background generators might work fine.

oremanj added a commit to oremanj/trio that referenced this issue May 21, 2019
When exiting an outer cancel scope before some inner cancel scope nested inside it, Trio would previously raise a confusing chain of internal errors.
One common source of this misbehavior was an async generator that yielded within a cancel scope.
Now the mis-nesting condition will be noticed and in many cases the user will receive a single RuntimeError with a useful traceback.
This is a best-effort feature, and it's still possible to fool Trio if you try hard enough, such as by closing a nursery's cancel scope
from within some child task of the nursery.

Fixes python-trio#882. Relevant to python-trio#1056, python-trio#264.
@basak
Copy link
Member

basak commented May 19, 2020

It also means that cancelling a receive is recoverable, while cancelling anext destroys the generator object.

This is happening to me, and is surprising and unwanted. I'm implementing the guts of a channel in an async generator. The generator's control flow is its state, as opposed to using a state machine. If someone calling channel.receive() times out before the generator yields a value, I expect the generator to be suspended until the next call to __aiter__() rather than for the entire generator to be destroyed.

The only way around this I can see so far is to wrap every await inside the generator, wrap it with a try/except, yield some "I was cancelled" value (I'm using the trio.Cancelled exception object) instead, adjust control flow to "retry" the await inside the generator and have a wrapper reraise the exception from outside. But this still doesn't work when the generator is awaiting a result from some other async function because the "restart" would be in the wrong place.

Or I can implement a state machine, but that would seem to defeat the point of async.

Any suggestions please? This isn't exactly a cancel scope inside a generator so is perhaps off-topic, but your statement is the closest existing discussion on my issue that I can find. I can create a separate issue if you prefer?

@oremanj
Copy link
Member

oremanj commented May 19, 2020

I'm implementing the guts of a channel in an async generator. The generator's control flow is its state, as opposed to using a state machine. If someone calling channel.receive() times out before the generator yields a value, I expect the generator to be suspended until the next call to __aiter__() rather than for the entire generator to be destroyed.

I'm assuming by __aiter__ here you mean __anext__. Your desire is very reasonable, but unfortunately not compatible with Python generator semantics. A generator can only be suspended at a yield. By definition, when you cancel it, it's running but waiting on some async operation, not at a yield. Thus, the only options for responding to a cancellation are to either soldier on until the next yield or unwind entirely by throwing an exception. The latter case is the default behavior. The former case can be obtained by putting a shielded cancel scope around each call to __anext__, but then you lose all the benefits of cancellation.

By far the easiest approach is to push the async generator logic into a background task which communicates the values to be yielded over a memory channel. That way, you can cancel waiting for the value without cancelling obtaining the value. There's an adapter I wrote in #638 that lets you do this still with async generator syntax: #638 (comment)

The remainder of this comment is for the case where you really don't want to do that, for whatever reason. Be warned, it gets gnarly. :-)

If you want the semantics where cancelling the generator will cause it to resume whatever async operation it was working on when it was cancelled, you have to write it yourself, in a way that contains a yield statement. As you've discovered, your options are not very abstraction-friendly, because Python native async generators don't support yield from.

But... there is an evil brittle hack that you can use to suspend an async generator from within an async function that it calls:

import types, inspect
async def wrapper():
    holder = [None]
    while True:
        holder.append((yield holder.pop()))
co = wrapper.__code__
# on 3.8+:
wrapper.__code__ = co.replace(co_flags=co.co_flags | inspect.CO_COROUTINE)
# equivalent on pre-3.8, doesn't work on 3.8+:
#wrapper.__code__ = types.CodeType(co.co_argcount, co.co_kwonlyargcount, co.co_nlocals, co.co_stacksize, co.co_flags | inspect.CO_COROUTINE, co.co_code, co.co_consts, co.co_names, co.co_varnames, co.co_filename, co.co_name, co.co_firstlineno, co.co_lnotab, co.co_freevars, co.co_cellvars)
wrapper = wrapper()
wrapper.send(None)

@types.coroutine
def yield_(value):
    yield wrapper.send(value)

Then await yield_(foo) works like yield foo, but you can call it from an async function that the async generator calls, not just from the async generator directly. See python-trio/async_generator#16 for more on this monstrosity. It works on current versions of both CPython and PyPy, but it might crash on PyPy if you use debugging tools in certain ways, and it's very tied to implementation details that might change in the future.

If you don't like evil brittle hacks, you can use the @async_generator backport package for your async generator instead of making it a native one. Its await yield_() also supports this sort of thing (but only for its own async generators, not for the native ones).

Whichever approach you take, you can now write an abstraction like you've described:

class MyResilientAsyncGenerator:
    def __init__(self, *args, **kw):
        self._aiter = self._logic(*args, **kw)

    def __aiter__(self):
        return self

    async def __anext__(self):
        with trio.CancelScope(shield=True) as self._scope:
            value = await self._aiter.__anext__()
            if isinstance(value, trio.Cancelled):
                raise value
            else:
                return value

    async def aclose(self):
        await self._aiter.aclose()

    async def _restart_point(self, afn, *args, **kw):
        while True:
            self._scope.shield, prev_shield = False, self._scope.shield
            try:
                return await afn(*args, **kw)
            except trio.Cancelled as exc:
                await yield_(exc)
            finally:
                self._scope.shield = prev_shield


    async def _logic(self):
        # Your previous asyncgen goes here.
        # Use 'await self._restart_point(fn, *args)' instead of 'await fn(*args)' for the
        # cancellable-retry-on-resume operations. All other operations are uncancellable.
        pass

This is untested. If you wind up using it, I'd be curious for feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants