New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU load after cancel #695
Comments
Infinite call recursion in anyio/src/anyio/_backends/_asyncio.py Line 469 in e0529a3
|
Without SIGINT: #!/usr/bin/env python3
from anyio import CancelScope, create_task_group, run, sleep
async def shield_task() -> None:
with CancelScope(shield=True):
await sleep(60)
async def task() -> None:
async with create_task_group() as tg:
tg.start_soon(shield_task)
async def main() -> None:
async with create_task_group() as tg:
tg.start_soon(task)
tg.cancel_scope.cancel()
run(main) |
I can repro this too. But infinite call recursion? How do you figure that? |
Yeah, not infinite call recursion. The top level cancel scope continuously retries cancellation because it only sees that its immediate child task ( |
You can see the same behavior by modifying the async def main() -> None:
async with create_task_group() as tg:
tg.start_soon(task)
await wait_all_tasks_blocked()
tg.cancel_scope.cancel() |
The trick is, I suppose, how to make it figure out that it shouldn't try to cancel the middle task which is waiting on the task which is in a shielded scope. |
Has anyone found a fix for this? We ran into it and it's killing our services. Currently looking at moving to Trio to avoid it |
Can you describe your use case where it's doing this? |
It's in a fairly complex Starlette app so it's hard to point to one thing, but it seems to be happening for us with HTTPX cancelations (among potential other things) inside of other cancel scopes. We noticed services in our cluster starting to get pinned to 100% CPU usage, and investigated. After a lot of digging, we realized that even after all open requests closed, there were still tasks in the event loop that should have already been cancelled. They weren't things we expected to use much/any CPU. It was, in all the cases we reproduced, HTTP calls. However, that's the main thing that the service was doing in our reproduction so its possible other codepaths which get cancelled/timed out would have similar issues. It mainly seems to happen when the event loop is overloaded, so it's possibly some kind of race-condition with cancellation (in our specific case) that triggers us getting into this state, but the actual end state looks like its the same as this (an orphaned cancelling task that consumes all of the CPU). |
I'll try to get a fix for this into the next release, but I have to say it's a pretty tricky one to fix. |
My best attempt at fixing this involved shielding the part of |
Things to check first
I have searched the existing issues and didn't find my bug already reported there
I have checked that my bug is still present in the latest release
AnyIO version
4.3.0
Python version
3.9.2, 3.12.1
What happened?
After Ctrl+C the program uses 100% CPU.
Looks like the problem in (since
call_soon
works without problems):anyio/src/anyio/_backends/_asyncio.py
Line 231 in e0529a3
https://github.com/python/cpython/blob/72dbea28cd3fce6fc457aaec2107a8e453073297/Lib/asyncio/base_events.py#L871
How can we reproduce the bug?
Ctrl+C
The text was updated successfully, but these errors were encountered: