Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] isolate joblib #6385

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

[MNT] isolate joblib #6385

wants to merge 1 commit into from

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented May 3, 2024

This PR isolates joblib, turning it from a de-facto direct dependency in a soft dependency that happens to be an indirect core dependency.

Currently:

  • joblib is an indirect core dependency, as core dependency of core dependency scikit-learn.
  • there are direct, unconditional imports of joblib at module level

Therefore, if scikit-learn were to drop joblib as a dependency, sktime would break - as a package.

For this reason, it is best practice to have direct imports as a direct core dependency, which is done in #6384.

However, the joblib dependencies are entirely optional, as not coupled to the framework, only to specific estimators (mostly/morally). So we can turn it into a silent soft dependency, which is the approach taken by this PR.

Some exceptions concern the tests, but in all such cases there are existing isolations in scikit-base which can be followed, pending de-duplication of test code.

@fkiraly fkiraly added the maintenance Continuous integration, unit testing & package distribution label May 3, 2024
@yarnabrina
Copy link
Collaborator

the joblib dependencies are entirely optional, as not coupled to the framework, only to specific estimators

Is there any reason to not have parallelisation using joblib as default setting and part of framework?

(I'm not sure if this thread is appropriate for this, so please let me know if it's not.)

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 4, 2024

Is there any reason to not have parallelisation using joblib as default setting and part of framework?

I think it makes sense as a default for many estimators, especially since joblib can have custom backends, but there are alternatives, e.g., dask, which is preferred by some.

Another reason to have it optional is, plain loops are slower but more resilient with respect to properties of the environment.

I also want to note, this PR does not remove or change the posture towards joblib in terms of its usage or being a default, it just makes sktime robust against (unlikely) removal of joblib as a core dep from any of sktime's core deps.
I.e., this is the more defensive version of your PR #6384. It is of course up for discussion which one we prefer.

@yarnabrina
Copy link
Collaborator

yarnabrina commented May 4, 2024

Understood. I'll open a feature request to make parallelisation by joblib as default in a new issue.

Update: created #6387

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Continuous integration, unit testing & package distribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants