Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] resolving partial duplications between sktime and skpro #6396

Open
fkiraly opened this issue May 7, 2024 · 6 comments
Open

[ENH] resolving partial duplications between sktime and skpro #6396

fkiraly opened this issue May 7, 2024 · 6 comments
Labels
enhancement Adding new functionality maintenance Continuous integration, unit testing & package distribution module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting module:probability&simulation probability distributions and simulators module:regression regression module: time series regression

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

We should think about how best to resolve the partial duplications between sktime and skpro related to probability distributions and metrics, in terms of end state and transition strategy. Quickly summarized:

  • skpro has a distributions module, which is at the moment an improvement and evolution above sktime's proba module.
  • in sktime, distributions are used in probabilistic forecasting currently, but could soon also be used in time series anomaly and event modelling, or probabilistic time series regression.
  • skpro has some features and specific distributions which do not exist in sktime, but can be directly useful. It has evolved, in terms of functionality and maturity, beyond what is available in sktime.
  • skpro and sktime have a similar module for probabilistic regression metrics, used in evaluation and tuning in both packages.
  • the interfaces in sktime probabilistic forecasting and skpro probabilistic regression share a number of logic elements, especially method interfaces, boilerplate, and dispatch defaults. The same logic would also be attache to probabilistic time series regressors, if created.

For end state, one could imagine:

  • sktime takes a dependency on skpro and uses distributions and/or metrics from the latter. This is directly possible since the improved skpro distributions are fully API compliant.
  • skpro takes a dependency on sktime - in this case, the modules would have to be moved to sktime and replace the current state of modules related to distributions and/or metrics
  • the status quo, although that would not resolve duplications

For transition, deprecation will be necessary, as an abrupt switch to imports of objects while removing the original import locations of distributions etc will break user code.

  • an option is replacing actual objects with imports of the same name in the same locations. [MNT] replace proba module by skpro - diagnostic to ensure API consistency #6393 shows that this would work. The question is whether this could break user code. I cannot see this, but it does not mean it's true.
  • another option is dispatching for a transition period - leaving objects as they are, but replacing them with imports if skpro is present. This is done for a transition period. Might be safer than the first option.
@fkiraly fkiraly added maintenance Continuous integration, unit testing & package distribution module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting module:regression regression module: time series regression enhancement Adding new functionality module:probability&simulation probability distributions and simulators labels May 7, 2024
@frthjf
Copy link

frthjf commented May 7, 2024

How about sktime taking an optional dependency on skpro? As far as I can tell, this is already quite common in sktime (e.g. require Tensorflow for MLP models etc.). As a user, it would make sense to me to see an error asking me to install skpro if I wanted to use probabilistic features in sktime. The import redirection may be useful in a transition period but I would aim to remove it longer term to avoid confusion (e.g. import sktime.tensorflow as tf would make we wonder what I am looking at).

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 7, 2024

How about sktime taking an optional dependency on skpro?

That is a valid target state, too.

It already does depend optionally, for reduction using probabilistic regressors to create probabilistic forecasters.

However, currently the probabilistic forecasting functionality in forecasters is a core features of sktime. Would we make skpro optional and deduplicate, then that would imply that probabilistic forecasting is an optional feature and available only if skpro is installed.
This is technically feasible, my worry about this is that very common functionality would no longer be onboard by default - it is currently available whenever sktime with minimal dependencies is present. So I excluded this mentally, not with any ulterior intention of course. I agree it is a significant option to consider pros/cons.

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 9, 2024

Any opinions on pros/cons, @frthjf?

@frthjf
Copy link

frthjf commented May 9, 2024

I think the option of sktime taking skpro as required dependency as implemented in #6393 is reasonable given that sktime arguably relies on it for "very common functionality". What might be a problem, is that any dependency added to skpro becomes a dependency in sktime. One way around that would be to move the distributions code into a separate repo and include it as a submodule in both sktime and skpro, but I am unsure if that's worth the trouble.

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 9, 2024

What might be a problem, is that any dependency added to skpro becomes a dependency in sktime.

Yes - although not as large as it could be, as currently the skpro dependencies are exactly those of sktime.
(soft dependencies wildly differ, but these are optional and not required by sktime)

It might become an issue at release though, when bounds are bumped, although I would think that less of a problem.

@frthjf
Copy link

frthjf commented May 10, 2024

Yes, given that both projects are under the sktime organization, the dependency management should be workable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality maintenance Continuous integration, unit testing & package distribution module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting module:probability&simulation probability distributions and simulators module:regression regression module: time series regression
Projects
None yet
Development

No branches or pull requests

2 participants