-
Describe the bug Questions to clarify:
Please let me know if you need further details or context. To Reproduce from sktime.datasets import load_airline
from sktime.transformations.series.date import DateTimeFeatures
from sktime.transformations.series.summarize import WindowSummarizer
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.trend import STLForecaster
from sktime.forecasting.compose import MultiplexForecaster
from sktime.forecasting.model_selection import SlidingWindowSplitter
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.performance_metrics.forecasting import MeanSquaredError
from sktime.utils.plotting import plot_series
from sktime.utils import mlflow_sktime
import numpy as np
import pandas as pd
# data preparation
df = load_airline().to_frame()
target = df.columns
# simple feature engineering
datetime_fe = DateTimeFeatures(ts_freq='M', keep_original_columns=True)
kwargs = {'lag_feature':{'mean':[[1,2], [1,3], [1,4]]}}
lags_fe = WindowSummarizer(target_cols=target, truncate='bfill', **kwargs)
tranfo_pipe = datetime_fe * lags_fe
df_transfo = tranfo_pipe.fit_transform(df)
temp_df = df.copy()
df_transfo[target] = temp_df[target]
del temp_df
# keep last 12 time points as unseen future data
fh = 12
train = df_transfo.iloc[:-fh]
unseen = df_transfo.iloc[-fh:]
# build multiplex forecaster
multiplex_frctr = MultiplexForecaster(
forecasters=[
('naive', NaiveForecaster()),
('stl', STLForecaster()),
]
)
# models hyperparameter grids for model selection forecaster
multiplex_params_grid = [
{'selected_forecaster': ['naive', 'stl',]},
{'naive__sp': [4, 12]},
{'stl__seasonal': [7, 13]},
]
# create a splitter
splitter = SlidingWindowSplitter(fh=np.arange(1, 12+1), window_length=48, step_length=21)
# search for the best forecaster
gscv_multiplex = ForecastingGridSearchCV(
forecaster=multiplex_frctr,
cv=splitter,
param_grid=multiplex_params_grid,
scoring=MeanSquaredError(square_root=True),
n_jobs=-1,
error_score='raise',
)
# fit and predict with exog varaibles
best_forecaster = gscv_multiplex.fit(y=train[target], X=train.drop(columns=target), fh=splitter.get_fh())
backtest = best_forecaster.predict() # backtest results are logged in database
plot_series(train[target], unseen[target], backtest, labels=['train', 'unseen', 'backtest']); Step 2: Assuming I'm satisfied with the hyperparameters and backtest results. I update # update model with unseen data and predict
pred = best_forecaster.update_predict_single(y=unseen[target], fh=splitter.get_fh(), X=unseen.drop(columns=target))
plot_series(train[target], unseen[target], y_pred, pred, labels=['train', 'updated values (unseen)', 'backtest', 'pred']); Step 3: I save # save best forecaster
save_model_path = 'multiplex_forecaster'
mlflow_sktime.save_model(sktime_model=best_forecaster, path=save_model_path)
Step 4: Incremental learning. Update `best_forecaster` from previous step with a new data point.
# another month arrive, load the model
loaded_model = mlflow_sktime.load_model(model_uri=save_model_path) Simulate new data for new month # simulate new data for 1961-01
last_period = df.index[-1]
new_period = last_period + 1
new_data = {target[0]: 500.0}
new_row_df = pd.DataFrame(new_data, index=[new_period])
df = pd.concat([df, new_row_df], axis=0)
# rebuild df_transfo with new data
df_transfo = tranfo_pipe.fit_transform(df)
temp_df = df.copy()
df_transfo[target] = temp_df[target]
del temp_df With new data, I want to update # prepare new data and exog for loaded_model
new_y = df_transfo[target].iloc[[-1]]
new_X = df_transfo.drop(columns=target).iloc[[-1]]
# update model with new month data and predict
new_pred = loaded_model.update_predict_single(y=new_y, fh=splitter.get_fh(), X=new_X) # log new_pred to database
plot_series(df[target].iloc[:-1], new_y, new_pred, labels=['past', 'updated', 'pred']) So now I thought, ok this is working, no errors so far. But most predictions made in step 2 and 3 are the same for periods which they coincide with each other. Am I doing the 'update' correctly? From 1961-02 to 1961-12, both Expected behavior Additional context Versions
System:
python: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)]
executable: c:\path\to\venv\Scripts\python.exe
machine: Windows-10-10.0.19042-SP0
Python dependencies: |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Some comments ahead:
(this has 4 elements, while your grid has 8, out of which 4 are redundant) |
Beta Was this translation helpful? Give feedback.
-
I think so - I am guessing your grid search selects the
yes.
possibly, seems plausible. |
Beta Was this translation helpful? Give feedback.
-
Hello @fkiraly thanks for your reply and appreciate it.
Thanks for pointing this out. I had to tweak it a little so it is fine now. I also changed the models and now from sktime.forecasting.compose import make_reduction
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
# build multiplex forecaster
multiplex_frctr = MultiplexForecaster(
forecasters=[
('gbr', make_reduction(GradientBoostingRegressor(random_state=42))),
('knnr', make_reduction(KNeighborsRegressor(n_jobs=-1))),
]
)
# models hyperparameter grids for model selection forecaster
multiplex_params_grid = [
{
'selected_forecaster': ['gbr'],
'gbr__estimator__n_estimators': np.arange(50, 200, 50)
},
{
'selected_forecaster': ['knnr'],
'knnr__estimator__n_neighbors': np.arange(5, 10, 1)
}
]
After changing the models the values aren't replaying themselves. # compare the 2 sets of predictions made
plot_series(df[target].iloc[:-1], pred, new_y, new_pred, labels=['past', 'pred', 'updated', 'new_pred']) |
Beta Was this translation helpful? Give feedback.
-
so, all is fine? Or do you still think there is an issue? PS: typical ML tabular regressors (especially tree based ensembles) will not be able to extrapolate, that's why you see the forecasts do not go "above" the values observed in the past. If you want that, you ought to pipeline with sth like a |
Beta Was this translation helpful? Give feedback.
-
All should be fine now.
I have already implemented that in the pipeline for my data together with |
Beta Was this translation helpful? Give feedback.
-
alright - since there was no bug, I've converted this into a discussion. |
Beta Was this translation helpful? Give feedback.
I think so - I am guessing your grid search selects the
NaiveForecaster
withsp =12
. You would expect the predictions to coincide, as they are simply replaying th value 12 months prior.yes.
possibly, seems plausible.
The reason that some values are exactly identical is likely that your grid search selects the
NaiveForecaster
,…