Skip to content

sktime dev days 2021 forecasting work stream notes

Franz Király edited this page Jun 24, 2021 · 1 revision

sktime 2021 dev days - break-out session notes

Session: Forecasting/Annotation

Reporter: Taiwo Owoseni

👋 Roll call

💡 Define the workstream scope (10min)

What is the area to work on during the dev days, as a group? Be as specific as you can, without going too much into details.

Univariate forecasting

  • finish refactoring of forecaster interface - done = all forecasters in sktime are interface compliant

Multivariate forecasting

  • "multivariate forecasting" - what's the goal?
  • interface design for multivariate
  • "automatic" handling of different input/output types
  • interfacing concrete multivariate forecasters like VARIMAX
  • Multivariate pipelining (already work in progress), most important compositors (tuning etc) too

Annotation

  • annotation design and prototype finished, should include multivariate and panel (!)
  • example annotators implemented in prototype
  • implement pyOD wrapper

docs as we go along!

🚧 Collect related issues (15min)

Only issues that already exist.

issues and prs relating to refactoring univariate forecasters:

enhancement proposals:

doc related:

missing/need:

  • issue for composition patterns in multivariate forecasting
  • issue related to wrappers/composers that turn univariate to multivariate
  • issue related to annotation
  • issue related to concrete annotators (pyOD already exists)
  • annotation sub-case specific: segmentation
  • annotation sub-case specific: change point detection
  • annotation sub-case specific: outlier detection
  • annotation conditional compositors, e.g., conditional removal or series-to-panel (e.g., epoching)

🔍 High-level work plan (20min)

Identify the most important work items, in bullet points. Identify which are crucial dependencies, which are optional. Estimate how much time the work will roughly take. Tentatively put names against the work items.

Think carefully about:

  • what is realistic to achieve during the dev sprint (3 days)
  • what should go on longer roadmap
  • which items are "good first issues", which ones are expert issues

Create a work plan for the week. Prioritize so crucial items are covered. Ensure there are a number of "good first issues" for new community members

Univariate forecasting

Work item Coordinator
list of forecasters to work on: https://github.com/alan-turing-institute/sktime/issues/955 Taiwo
reduction module refactoring sktime.forecasting.compose._reduce.py Taiwo, Lovkush, Markus
refactor fbprophet, https://github.com/alan-turing-institute/sktime/pull/1005 Help needed
forecasting tutorial, advanced composition and tuning https://github.com/alan-turing-institute/sktime/issues/988 Martin
Forecasting Refactoring and Progress https://github.com/alan-turing-institute/sktime/issues/1007 Taiwo

Examplary refactoring: https://github.com/alan-turing-institute/sktime/pull/953

Annotation

Work item Coordinator
finish PR for annotation framework (base annotator and unit testing) Satya
finish PR for PyOD wrapper Satya
add unit test for PyOD wrapper Satya
handle PyOD as soft dependency Satya
annotation data container designs Franz
unsupervised segmentation Franz
interfacing basic segmenters: hmmlearn etc Franz
supervised segmentation/annotation Franz
alignment & distances? Franz

need to ensure compatibility between outlier detection, different tasks, and new forecasting interface

Multivariate forecasting

  • conditional on univariate forecasting refactoring
Work item Coordinator
API design: multivariate interface with base functionality for multiple input/output types https://github.com/alan-turing-institute/sktime/pull/980 Franz
group existing forecasting functionality into whether it can be easily extended to handle multivariate series or not Lovkush (or Markus?)
"obvious" conversion wrappers like "apply-per-variable"
Multivariate Pipelining (ForecastingPipeline) Martin
interfaces for new multivariate forecasters Lovkush (or Markus?)

📝 Prepare the report-out (10min)

The reporter should prepare a quick summary of the above.

Markdown is perfectly fine here, but can also be PowerPoint or Paint.

🔧 Create issues (15min, can also be done later & iteratively)

Turn the high-level work plan into issues!

Write descriptive issue descriptions, with a clear definition of "done".

Consider using "checkbox items" to create sub-tasks - i.e., use -[] in the issue description.

Consider using a project board (but don't overcomplicate it) or linking the issue to an existing board.

Add issue tags.

Tracking for refactoring of forecasters

Tracking which forecasters have been started; there should be a PR for the ones that are ticked

  • NaiveForecasters #953
  • EnsembleForecaster #977
  • MultiplexForecaster #977
  • TransformedTargetForecaster #977
  • _Reducer and related code #1031
  • StackingForecaster #977
  • ForecastingGridSearchCV, RandomizedGridSearchCV, and BaseGridSearch
  • OnlineEnsembleForecaster and descendants #1015
  • _PmdArimaAdapter and descendants ARIMA, AutoARIMA #1016 - adapter only
  • BATS, TBATS, and _Tbatsadapter #1017 adapter only
  • _StatsModelAdapter and descendants AutoETS, ExponentialSmoothing, ThetaForecaster
  • HCrystalBallForecaster #1004
  • Prophet and _ProphetAdapter #1005
  • PolynomialTrendForecaster #1003

Tracking which forecasters have been finished; there should be a closed PR for the ones that are ticked

  • NaiveForecasters #953
  • EnsembleForecaster
  • MultiplexForecaster
  • TransformedTargetForecaster
  • _Reducer and related code
  • StackingForecaster
  • ForecastingGridSearchCV, RandomizedGridSearchCV, and BaseGridSearch
  • OnlineEnsembleForecaster and descendants
  • _PmdArimaAdapter and descendants ARIMA, AutoARIMA
  • BATS, TBATS, and _Tbatsadapter
  • _StatsModelAdapter and descendants AutoETS, ExponentialSmoothing, ThetaForecaster
  • HCrystalBallForecaster
  • Prophet and _ProphetAdapter
  • PolynomialTrendForecaster

:::info

Example: (sklearn pipeline) Create issues on GitHub about:

  • Implementing ColumnTransformer
  • Implementing FeatureUnion
  • Updating existing pipeline class
  • Update docs :::

2021-06-23 discussion points

  • tags discussion - Franz, Markus, Martin, Taiwo, Tony
    • use of tags - semantic/indexing lookup for user, or only internal/testing?
      • importance in checks and conversions (e.g., what to)
      • input validity & related properties
      • algorithmic features/properties, availability of interface points e.g., can produce performance estimates
      • writing common tests for estimators with the same tags
      • user guidance
    • inheritance and default values
    • if inheritance: child classes to specify all tags, or only tags deviating from default? (FK: perhaps all, more robust against changes to defaults)
      • inheritance yes, and override only non-defaults
      • tests for robustness
    • object or class level
      • both for long-term; short/mid-term, focus on class level
      • object and class level should both be inspectable
      • user wants to be shown object level one typically
    • documentation of tags - where, how?
      • fixed description of tags, but where?
      • ALL_TAGS to be factored out and supplemented by plain english descriptions; docs auto-generated from that
    • how do we agree on which tags?
      • that file has codeowners, it's us
    • "meta-tags", e.g., which tags apply to which scitypes?
      • dataframe with three columns? tag name, list of scitypes, plain English description?
      • maybe structured strings in tags, like "forecaster:supports_exogeneous_X"; some might be generic, like "handles_missing_data" or "multivariate" (?)
    • forecaster tags - boolean?
      • can have non-boolean, but we need to carefully watch testing
    • which tags should we have? for forecasters?
    • lookup of tags, all_tags like all_estimators? by scitype?
      • yes, and is easy with the table above
    • lookup of estimators based on tags?
      • ML: yes should be easy to implement via an additional filtering of the list of collected estimators
    • display of estimators with tags, #995, #996
      • great idea, should be autogenerated from above
    • tag refactor? #1013

2021-06-24 discussion points on type-conversions and multivariate forecasting

(written by LA, and so represents their perspective/their biases) Discussion focussed on this PR https://github.com/alan-turing-institute/sktime/pull/980.

Points of agreement:

  • IO type conversions is fundamentally separate to multivariate forecasting (though they are related)
  • We should have some sort of wrapper that converts univariate forecasters into multivariate forecasters. But how precisely this is done needs to be determined
    • (Thought by LA later, this sounds like a reduction, of multivariate forecasting to univariate forecasting. This suggests a reduction style interface, where user creates multivariate forecaster using reduction function and a univariate forecaster object. Separate to this, can have reductions from multivariate forecasting to tabular regression)
  • I think there is agreement that it is bad idea to re-write all forecasters so that they only use methods that can deal with both pandas dataframes and pandas series.
    • Note that even in the example Markus showed, there was if loop to distinguish between dataframe and series in one of the imputations

ML's concerns:

  • IO type conversion is significant change, and wants more time for him (and others) to fully think through the consequences.
    • E.g. how large is the cost of the various conversions?
      • FK's counter to cost concerns. FK says the only time IO type conversion creates extra cost is if otherwise there is an error message.
    • E.g. philosophy of allowing multiple internal types, rather than sticking to some single internal type
  • we want to create some sort of multivariate functionality quickly, quicker than the time needed to consider the IO type conversion question

FK's concerns:

  • There is risk of delaying decision on input IO type conversion if we do not make decision now or if we do not have explicit plan for decision to be made
  • Dev days week is best time for FK to spend time on sktime and on big tasks, so quicker decision will allow FK to do more

Ryan's thoughts:

  • IO type conversion is elegant, but needs time to think through consequences
  • Ryan: Not sure how big the intersection of multivariate and univariate algorithms is:
    • believes that majority of univariate forecasters will become multivariate by doing things column-by-column (i.e. via the reduction wrapper) b/c optimal hyper-params will vary by series (column).
    • The multivariate algorithms that can be applied to univariate data typically do so when that algorithm can simplify to a univariate model.
      • Are there alot of models where we wouldn't have a univariate model already?
      • What is the cost of raising an informative message to point users to correct univariate implementation?

LA's thoughts:

  • Should pause on IO type conversion to give MK (and others) time to consider the consequences
  • Believes that IO type conversions is a good idea, and should be implemented at some point.
  • (Variable names can be improved in the PR)
  • Didn't say this during discussion, but I don't think writing the multivariate wrapper is a quick task - lots of design decisions need to be made for it.

Guzal's thoughts:

  • Likes the logic of FK's IO conversion, but doesn't know enough to judge if ML's concerns are valid or not

Taiwo:

  • No thoughts on these issues