Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with dynamically imported functions. #406

Closed
tobiasraabe opened this issue May 26, 2023 · 2 comments
Closed

Error with dynamically imported functions. #406

tobiasraabe opened this issue May 26, 2023 · 2 comments

Comments

@tobiasraabe
Copy link

tobiasraabe commented May 26, 2023

Hi!

I have a package called pytask, which behaves almost like pytest in that it dynamically collects tasks (task and not test functions) from modules and executes them.

Then, there is also pytask-parallel (similar to pytest-xdist) that provides multiple backends to parallelize the execution of tasks, and one is loky. When executing tasks in parallel, I get an error with loky.

Here is a minimal example with a task_example.py which defines the task function and in main.py you find the runner that dynamically imports the function and executes it with loky.

# Content of task_example.py.
def task_example(): pass
# Content of main.py
from pathlib import Path
import sys
from types import ModuleType
import importlib.util

from loky import get_reusable_executor


def import_path(path: Path) -> ModuleType:
    module_name = path.name

    spec = importlib.util.spec_from_file_location(module_name, str(path))

    if spec is None:
        raise ImportError(f"Can't find module {module_name!r} at location {path}.")

    mod = importlib.util.module_from_spec(spec)
    
    # Comment the line out to make the submitted task succeed.
    sys.modules[module_name] = mod
    
    spec.loader.exec_module(mod)
    return mod


if __name__ == "__main__":
    module = import_path(Path("task_example.py"))

    function = module.task_example

    executor = get_reusable_executor(max_workers=2, timeout=2)
    res = executor.submit(function)

    print(res.exception())
    print(res.exception().__cause__)

If you run python main.py, you see the following output.

A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

"""
Traceback (most recent call last):
  File "/home/tobia/mambaforge/envs/pytask-parallel/lib/python3.11/site-packages/loky/process_executor.py", line 426, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tobia/mambaforge/envs/pytask-parallel/lib/python3.11/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'task_example.py'; 'task_example' is not a package
"""

Interestingly, this error does not occur if you comment out the line with sys.modules[module_name] = mod.

But, this line is necessary since you otherwise see errors with dataclasses: pytask-dev/pytask#373.

Using another backend like concurrent.futures.ProcessPoolExecutor does not lead to this error.

I hope you have more insights into why this error is happening. If you need more info, I am happy to give it to you.

Thanks for looking into this issue! 🙏

@ogrisel
Copy link
Collaborator

ogrisel commented Jun 28, 2023

Loky uses cloudpickle under the hood to make it possible to call parallel execution on interactively or locally defined functions / class methods.

So investigating the root cause would probably involve in setting a break point somewhere appropriate in cloudpickle to find out what's wrong here.

@tobiasraabe
Copy link
Author

This example contains an error and does not reproduce the problem I am facing. I will come back if I manage to replicate it. Thanks for your time 🙏.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants