Skip to content

API comparison: sktime vs HCrystalball

Markus Löning edited this page Aug 4, 2020 · 1 revision

Contributors: @fkiraly, @mloning

A comparison of sktime and HCrystalball API designs for forecasting, and proposed way forward.

Design comparison

Both sktime and HCrystalball adopt a sklearn-like fit/predict design, and a unified interface.

High-level differences

The below table summarizes the main differences:

Area sktime HCrystalball
data container pandas series pandas DataFrame
supports multivariate no yes
supports exogeneous experimental yes
supports iloc use yes no
supports loc use no yes
type consistent composition yes no
task interoperability yes no

For explanation:

  • type consistent composition means: composites inherit from, and follow the same interface as a class type ancestor. For example, GridSearchCV in sklearn behaves as a classifier, when constructed with a classifier. The compositor itself is an estimator class.
  • task interoperability means: the interface is designed to allow reduction to other time series related tasks
  • loc and iloc usage implies support for integer and date/time indices, and specification of the forecasting horizon as relative steps ahead and absolute time points respectively

On a high-level, HCrystalball's interface seems inspired by Facebook's prophet. sktime's interface is closer to statsmodels and the Hyndman interfaces in R (e.g. forecast, fable).

Advantages and disadvantages

This section highlights advantages, disadvantages, and problems, according to our opinion.

Advantages of sktime:

  • "natural" interface in univariate case
  • higher-order operations, including composition and reduction, are well-handled

Problems of sktime:

  • lack of loc support
  • no good multivariate support

Advantages of HCrystalball:

  • support for multivariate and exogeneous
  • uses abc

Problems of HCrystalball:

  • higher-order operations are not well-designed or consistent
  • lack of iloc support
  • interface is unintuitive in the univariate case

Problems of both:

  • does not consistently cover both univariate, multivariate use well - user frustration in at least one sub-case
  • user cannot use series and DataFrame
  • no support for both iloc and loc (indexed, e.g., datetime) indexing

Fit/predict API signatures

Up to naming of variables, both sktime and HCrystalball adopt a fit/predict API, of the type

fit(y_past, [x_past], horizon)
predict([x_future], horizon)

where:

  • y_past is the time series in the past,
  • horizon is the indices (loc or iloc) to predict at - note that some methods already require this in fit
  • x_past is exogeneous time series in the past
  • x_future is exogeneous time series in the future

The differences are mainly in expected type:

variable sktime HCrystalball
y_past pandas series pandas DataFrame
horizon in fit integer sequence not supported (instead fitting is moved to predict in cases where horizon is required for fitting)
horizon in predict integer sequence empty DataFrame with loc indices
x_past pandas DataFrame (experimental) pandas DataFrame
x_future pandas DataFrame (experimental) pandas DataFrame

Proposed way forward

The interface differences suggest:

  • different signature and type choices cover different use cases well (e.g., univariate vs multivariate) - a joint/merged interface may therefore be desirable.
  • the interfaces are currently incompatible, while compatibility will require support for both series and DataFrames, and support for both loc and iloc indexing.
  • the sktime interface has an advantage in composition and other higher-order operations. A joint interface should perhaps adopt this.

Requirements for a unified interface

More precisely, a "good" consensus interface should satisfy the following requirements:

  • support for both series and DataFrames as inputs/outputs
  • support for both loc and iloc indexing
  • support for exogeneous variables
  • horizon can be passed in fit
  • consistent typing in higher-order motifs including composition, wrappers, reduction (inherits from resultant type class, components passed in constructor)

Way of working, forward

We therefore suggest:

  • sktime and HCrystalball work together towards a unified forecasting interface in the next release.
  • This unified interface should satisfy the requirements outlined above
  • HCrystalball becomes an affiliated package of sktime (means: compatible interface) - displayed on the landing page with other affiliated and coordinated packages
  • HCrystalball specifies a scope and roadmaps, e.g., adapters to advanced forecasters with major package dependencies?
  • individual HCrystalball team members are acknowledged as contributors to sktime, insofar they ontribute to the re-factor
  • optionally, Heidelberg Cement is acknowledged as a contributing organisation to sktime post-refactor, pending approval of Heidelberg Cement comms

Proposed API re-design principles

The proposed re-design is based on two work items:

  • HCrystalball adapts sktime's higher-order composition/reduction interface (correct class inheritance structure)
  • re-factor of fit/predict signatures towards a consensus, which is type union based

The consensus could be as follows:

variable consensus type
y_past pandas series or DataFrame
return of predict same as type of y_past
horizon integer sequence (iloc) or sequence of loc indices or empty DataFrame with loc indices
x_past pandas series or DataFrame
x_future pandas series or DataFrame, needs same type and variables as x_past

There may be an additional flag for whether loc or iloc indices are used.

The low-level design could look similar to this, though the linked proposal is mainly concerned with support or datetime.