You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently all (as far as I know) parallelisation options are disabled by default and needs to be enabled using set_config. But almost always end users will prefer a parallel version over sequential one.
This issue to propose to make the parallelisation by loky with joblib as the default option.
If users prefer to disable, for debugging or other purposes, they should have the capability using set_config.
If users prefer parallelisation by other tools, e.g. dask, they should be given the option as well, but default should be based on a method that's always available through core dependencies.
The text was updated successfully, but these errors were encountered:
Currently all (as far as I know) parallelisation options are disabled by default and needs to be enabled using set_config. But almost always end users will prefer a parallel version over sequential one.
That is not universally true - in cases where joblib is hard coded, the default is joblib, and here n_jobs default also varies across the code base. Backend is laways loky though, afaik.
Another question, is parallelization with joblib / loky the most sensible default?
The obvious argument for is better performance as a default.
Two arguments against:
lower compatibility range, higher breakage risk - depending on the environment, the estimator may simply crash uninformateively.
a default of joblib would enable nested joblib as a default in pipelines, composites! This will in general break or result in unexpected behaviour. For instance, in parallelized transformations on hierarchical data, two layers would be enabled by default - over the hierarchical index, and the transformation internal one.
There was an issue on exactly this recently, I think @ninedigits spotted the nested parallelization (if you did, could you link? I cannot find it) - it is highly non-obvious, especially if it happens by default.
For me, the second argument makes the "contra" side outweigh the "pro" side at the moment, of course I'm happy to listen to arguments and weightings by others, as I do not think my own assessment is fully consolidated here, or complete.
Yes, that one - it is precisely an example of the case in the 2nd bullet point, with the inner estimator being AutoETS, and hierarchical parallelization.
Currently all (as far as I know) parallelisation options are disabled by default and needs to be enabled using set_config. But almost always end users will prefer a parallel version over sequential one.
This issue to propose to make the parallelisation by
loky
with joblib as the default option.If users prefer to disable, for debugging or other purposes, they should have the capability using set_config.
If users prefer parallelisation by other tools, e.g. dask, they should be given the option as well, but default should be based on a method that's always available through core dependencies.
The text was updated successfully, but these errors were encountered: