Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] make parallelisation default option wherever applicable #6387

Open
yarnabrina opened this issue May 4, 2024 · 3 comments
Open

[ENH] make parallelisation default option wherever applicable #6387

yarnabrina opened this issue May 4, 2024 · 3 comments
Labels
enhancement Adding new functionality

Comments

@yarnabrina
Copy link
Collaborator

Currently all (as far as I know) parallelisation options are disabled by default and needs to be enabled using set_config. But almost always end users will prefer a parallel version over sequential one.

  • This issue to propose to make the parallelisation by loky with joblib as the default option.

  • If users prefer to disable, for debugging or other purposes, they should have the capability using set_config.

  • If users prefer parallelisation by other tools, e.g. dask, they should be given the option as well, but default should be based on a method that's always available through core dependencies.

@yarnabrina yarnabrina added the enhancement Adding new functionality label May 4, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented May 4, 2024

Currently all (as far as I know) parallelisation options are disabled by default and needs to be enabled using set_config. But almost always end users will prefer a parallel version over sequential one.

That is not universally true - in cases where joblib is hard coded, the default is joblib, and here n_jobs default also varies across the code base. Backend is laways loky though, afaik.

Another question, is parallelization with joblib / loky the most sensible default?
The obvious argument for is better performance as a default.

Two arguments against:

  • lower compatibility range, higher breakage risk - depending on the environment, the estimator may simply crash uninformateively.
  • a default of joblib would enable nested joblib as a default in pipelines, composites! This will in general break or result in unexpected behaviour. For instance, in parallelized transformations on hierarchical data, two layers would be enabled by default - over the hierarchical index, and the transformation internal one.
    • There was an issue on exactly this recently, I think @ninedigits spotted the nested parallelization (if you did, could you link? I cannot find it) - it is highly non-obvious, especially if it happens by default.

For me, the second argument makes the "contra" side outweigh the "pro" side at the moment, of course I'm happy to listen to arguments and weightings by others, as I do not think my own assessment is fully consolidated here, or complete.

@ninedigits
Copy link
Contributor

@fkiraly
#6216 (reply in thread)
The issue was AutoETS has a parallelization option that was enabled.

@fkiraly
Copy link
Collaborator

fkiraly commented May 4, 2024

Yes, that one - it is precisely an example of the case in the 2nd bullet point, with the inner estimator being AutoETS, and hierarchical parallelization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality
Projects
None yet
Development

No branches or pull requests

3 participants