[core] Check if a ray task has errored without calling ray.get
on it
#45229
Labels
core
Issues that should be addressed in Ray Core
enhancement
Request for new feature and/or capability
P0
Issues that should be fixed in short order
Description
Goal: From a list of ray remote task futures, I want to be able to check if each of these has errored without needing to call
ray.get
individually on each element.This feature is offered by similar async execution APIs:
Current workaround
We have a "check for failure" function in Ray Train, which may incur some unnecessary overhead to fetch objects:
ray/python/ray/train/_internal/utils.py
Lines 49 to 58 in fa61109
Use case
I am implementing a control loop where I want to check on the status of some actor tasks every N seconds. I want to know if these actor tasks have failed as soon as possible so I can trigger some error handling. This involves me running an "error check" in a loop with a small amount of sleep time:
cc: @jjyao @rkooo567
The text was updated successfully, but these errors were encountered: