Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MSTL.inverse_transform fails if return_components=False #6397

Open
eangius opened this issue May 7, 2024 · 8 comments
Open

[BUG] MSTL.inverse_transform fails if return_components=False #6397

eangius opened this issue May 7, 2024 · 8 comments
Labels
bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects

Comments

@eangius
Copy link

eangius commented May 7, 2024

Describe the bug
Sktime is a great library thanks. Perhaps this is user error, but running the MSTL to remove multiple seasonalities from endogenous variables as a standalone component or within a regular pipeline works as expected but fails to generate predictions when within a TransformedTargetForecaster.

To Reproduce

from sktime.forecasting.naive import *
from sktime.forecasting.compose import *
from sktime.transformations.series.detrend import *
from sktime.datasets import load_airline

pipe = ForecastingPipeline(steps=[
    ('y', TransformedTargetForecaster(steps=[
        ('decompose', MSTL(periods=12, return_components=False)),
        ('forecaster',  NaiveForecaster()),
    ]))
]).fit(y=load_airline(), fh=[1, 2, 3])
y_pred = pipe.predict()  # exception here

Expected behavior
No exception at predict time. Seems like something is returning a pd.Series & something downstream is expecting a pd.DataFrame

Additional context

    Traceback (most recent call last):
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 575, in _get_axis_number
        return cls._AXIS_TO_AXIS_NUMBER[axis]
    KeyError: 1
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
        coro = func()
      File "<input>", line 12, in <module>
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
        y_pred = self._predict(fh=fh, X=X_inner)
      File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 531, in _predict
        return self.forecaster_.predict(fh, X)
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
        y_pred = self._predict(fh=fh, X=X_inner)
      File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 1053, in _predict
        y_pred = self._get_inverse_transform(self.transformers_pre_, y_pred, X)
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 149, in _get_inverse_transform
        y = transformer.inverse_transform(y, X)
      File ".venv/lib/python3.9/site-packages/sktime/transformations/base.py", line 738, in inverse_transform
        Xt = self._inverse_transform(X=X_inner, y=y_inner)
      File ".venv/lib/python3.9/site-packages/sktime/transformations/series/detrend/mstl.py", line 203, in _inverse_transform
        row_sums = X.sum(axis=1)
      File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6519, in sum
        return NDFrame.sum(self, axis, skipna, numeric_only, min_count, **kwargs)
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12503, in sum
        return self._min_count_stat_function(
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12486, in _min_count_stat_function
        return self._reduce(
      File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6430, in _reduce
        self._get_axis_number(axis)
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 577, in _get_axis_number
        raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
    ValueError: No axis named 1 for object type Series

Versions

Python dependencies: pip: 23.2.1 sktime: 0.28.0 sklearn: 1.4.2 skbase: 0.7.5 numpy: 1.26.4 scipy: 1.13.0 pandas: 2.2.1 matplotlib: 3.8.4 joblib: 1.4.0 numba: 0.59.1 statsmodels: 0.14.1 pmdarima: 2.0.4 statsforecast: 1.7.4 tsfresh: None tslearn: None torch: None tensorflow: None
@eangius eangius added the bug Something isn't working label May 7, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

How odd.

Strictly speaking it is superfluous to wrap the TransformedTargetForecaster in ForecatsingPipeline, but that should not impact behaviour (it just should be ignored, because it is a single elemen tpipeline).

I can see the issue: it is the axis=1 argument. If return_components=True, then the result will be a pd.DataFrame because it is multivariate. If it is False, then it is a pd.Series.

From a methodological standpoint, the inverse works correctly only in the return_components=True case, if the seasonal component is also forecast.

I see that one would expect that the seasonal component is continued periodically and added back.

So, returning X if pd.Series would remove the exception, but lead to unexpected behaviour, as the seasonal components are not added back.

Perchance, do you know, @eangius, is there an easy way to get an extrapolated form of all seasonal components in statsmodels MSTL? There should be?

Also FYI @luca-miniati who is the author and maintainer.

@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

To clarify, I think inverse_transform should do:

  • if return_components=True, exactly what it does currently - in this case it needs forecasters for all seasonal components and the residual if used in a pipeline
  • if return_components=False, a naive periodic continuation should be made for seasonal components, and it should be added to the transformed values. This might be slightly challenging, given that the index seen in _inverse_transform need not be contiguous with, or could intersect with, the index seen in fit.

@fkiraly fkiraly added the module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing label May 7, 2024
@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation May 7, 2024
@fkiraly fkiraly changed the title [BUG] predict MSTL within TransformedTargetForecaster fails [BUG] MSTL.inverse_transform fails if return_components=False May 7, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

updated the issue title - imo the root cause is that MSTL.inverse_transform fails whenever return_components=False

@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

there is also a wider issue, namely that inverse_transform test coverage seems insufficient to detect this, which should be investigated.

@fkiraly fkiraly moved this from Needs triage & validation to Reproduced/confirmed in Bugfixing May 7, 2024
@luca-miniati
Copy link
Contributor

Hi Franz, long time no see! I'd like to implement the functionality for return_components=False.

Let me know if I understand the solution correctly:

  • make predictions of the seasonal time series, using the provided fh
  • add up all the values, and return as pd.Series

And a clarifying question: why would the index seen in fit potentially not match the index of _inverse_transform?

@eangius
Copy link
Author

eangius commented May 7, 2024

Thanks for the quick diagnostic @fkiraly. Unfortunately I’m still a knob at statsmodel to tell how to extract all seasonal components..

For context, we are wrapping MSTL into a TransformedTargetForecaster because we have a previous processing step for the exogenous variables but that was not relevant to reproduce the problem.

As an extra bit of context, we tried with return_components=True and filtering out the other columns in a FunctionTransformer to keep things univariate but that gave us a different type of exception..

@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

@eangius, as possible workarounds for de/re-trending in a pipeline:

  • you can pipeline multiple Deseasonalizer-s, like Deseasonalizer(sp=24) * Deseasonalizer(sp=24*7) * my_forecaster for daily and weekly (if your data is hourly
  • you can try StatsforecastMSTL, this is a forecaster that is optimized and with integrated MSTL, though with a heavier dependency footprint

@fkiraly
Copy link
Collaborator

fkiraly commented May 7, 2024

Hi Franz, long time no see!

Nice to hear from you again, as well!

I'd like to implement the functionality for return_components=False.

Great, let me know if I can help.

Let me know if I understand the solution correctly:

  • make predictions of the seasonal time series, using the provided fh

  • add up all the values, and return as pd.Series

Yes, this should happen when it is pipelined with a forecaster.

Though, the MSTL estimator is a transformer, so the transformer needs to carry out the transformation steps only.

So, we need to take the indices in _inverse_transform, and determine the periodic pattern implied by what was fitted on fit.

And a clarifying question: why would the index seen in fit potentially not match the index of _inverse_transform?

If you work out what happens in a forecasting pipeline, the transformer gets the historic indices in fit, e.g., 0, 1, 2, ..., 100, and the indices corresponding to the fh in predict, fore a fh of 1, 2, 3, the X in _inverse_transform would have index 101, 102, 103.

If we have patterns of periodicities 3, 5, 7, denoting the indices of the periodic patterm by 3-0, 3-1, 3-2; 5-0, 5-1, ..., 5-4; 7-0, ..., 7-6, (dashes just for notation, not "minus") then for incides 101, 102, 103 we should forecast, for components, the indices 3-2, 3-0, 3-1; 5-1, 5-2, 5-3; 7-4, 7-5, 7-6.
(in python, we start counting with 0, so X-1 maps onto any index divisible without remainder by X)

I think this already must be done somewhere in transform if return_components=True?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
Bugfixing
Reproduced/confirmed
Development

No branches or pull requests

3 participants