Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building: force the runtime hook execution order based on filenames #7012

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

rokm
Copy link
Member

@rokm rokm commented Aug 2, 2022

Force the execution order of runtime hooks that are tied to imported modules and packages by sorting them by their filename (basename). This gives us stable and predictable execution order; up until now, the order depended on the order modules appeared in the module graph, i.e., on the order of imports made in the program.

It also gives us the ability to force execution of some hooks before other hooks, and conversely, force execution of some hooks
after all other hooks, by giving them appropriate names.

For example, a runtime hook named pyi_rth_000_00_mymod_early.py should be executed first (right after custom runtime hooks), while the one named pyi_rth_zzz_00_mymod_late.py should be executed last; regardless of the names used by potential 3rd party runtime hooks, as long as they adhere to the pyi_rth_ prefix.

@rokm rokm force-pushed the fixed-runtime-hook-order branch 2 times, most recently from cfcb9bc to 8f757e0 Compare August 2, 2022 20:37
Force the execution order of runtime hooks that are tied to
imported modules and packages by sorting them by their filename
(basename). This gives us stable and predictable execution order;
up until now, the order depended on the order modules appeared
in the modulegraph, i.e., on the order of imports made in the
program.

It also gives us the ability to force execution of some hooks
before other hooks, and conversely, force execution of some hooks
after all other hooks, by giving them appropriate names.

For example, a runtime hook named `pyi_rth_000_00_mymod_early.py`
should be executed first (right after custom runtime hooks), while
the one named `pyi_rth_zzz_00_mymod_late.py` should be executed last;
regardless of the names used by potential 3rd party runtime hooks,
as long as they adhere to the `pyi_rth_` prefix.
@rokm
Copy link
Member Author

rokm commented Aug 3, 2022

A bit of a context behind this change: the other day, I was looking into implementing support for loky process execution framework (and therefore joblib that uses loky as its default backend).

The approach is quite similar to that for multiprocessing; in a run-time hook, we need to examine sys.argv and divert the program flow if they match the arguments loky uses for its subprocesses. Now the difference is that with multiprocessing, this diversion happens when use code calls multiprocessing.freeze_support (which also means that unpleasant things happen if user forgets to call it, or calls it too late); in case of loky, we do not have such a function to hook into, so the program flow diversion would happen in the runtime hook, before reaching the user's scripts.

The problem is that loky also triggers spawning of multiprocessing resource tracker (probably by internally using some of its primitives), so we need a multiprocessing.freeze_support call somewhere to catch and divert it. The most reasonable place would be in the same loky runtime hook, but due to non-deterministic ordering of runtime hooks, the function may not be monkey patched at that point. The safe place would be the start of the entry-point script, but people who use either loky or joblib are unlikely to be even aware that multiprocessing is also part of the game...

So I would really like an ability to run a runtime hook (pyi_rth_zzz_01_loky_freeze.py) after all other runtime hooks have ran, just before the entry-point script is started. This way, we can be sure that multiprocessing.freeze_support is patched. And we can also be sure that other stuff set up runtime hooks is done - because it may be needed in the subprocess as well (for example, some path to data that needs to be overridden via environment variable).

And then, we could also have an additional multiprocessing runtime hook, pyi_rth_zzz_00_multiprocessing_freeze.py that simply calls multiprocessing.freeze_support, thus ensuring that it is called before running the entry-point script. So users could get away without calling it on their own (and if they do call it, it would essentially be a no-op).i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant