Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot of Forecasted vs Actual misrepresenting the fit by not inserting None at right index? #47

Open
philbeliveau opened this issue Jul 31, 2023 · 0 comments

Comments

@philbeliveau
Copy link

philbeliveau commented Jul 31, 2023

Hello Petronio, I noticed that Chen's conventional result looks “suspiciously” good as @wangtieqiao shared #25 . It appears that the plot is comparing the predicted t+1 values with the current time series T.
Interestingly, I encountered a similar issue while working with the library and a pwfts model of order 1. My validation plots (see image below) shows a great fit of my predicted values on the actual values (too great id say..) but the RMSE calculated using Measures.get_point_statistics turned out to be unexpectedly high at 2.32 compare to my other fit.
Capture d’écran, le 2023-07-31 à 09 35 57
Here is my code:

def Cash_in(train_set, valid_set):
    rows = []
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[12, 8])
    y_val = pd.Series(valid_set['Scale_Montant'])
    y_train = pd.Series(train_set['Scale_Montant'])
    ax.plot(y_val.values, label ='Validation',color='black')
    for method in [pwfts.ProbabilisticWeightedFTS]:
        for partitions in [Grid.GridPartitioner]:
            for npart in [4]:
                for order in [1]:
                    part = partitions(data=y_train.values, npart=npart, transformation=diff)
                    model = method(order=order, partitioner=part)
                    model.append_transformation(diff)
                    model.name = model.shortname + str(partitions).replace('>', '').replace('<', '').replace('class', '') +str(npart) + str(order)
                    model.fit(y_train.values)
                    # Validation forecast    
                    forecasted_values_valid = model.predict(y_val.values)

                    # Plot the fitted values of the train set against the actual
                    ax.plot(np.array(forecasted_values_valid), label =  str(model.shortname) + str(partitions) + str(npart) + " partitions" + str(order)+ ' order', color= 'blue')
                    ax.set_title('Validation')            

                    # Performance measure on the validation set 
                    rmse_v, mape_v, u_v = Measures.get_point_statistics(y_val.values, model)
                    rows.append([model.shortname, str(partitions).replace('>', '').replace('<', '').replace('class', ''), npart, order, rmse_v, mape_v, u_v])
                        
                    handles, labels = ax.get_legend_handles_labels()
                    lgd = ax.legend(handles, labels, loc=1, bbox_to_anchor=(1, 1))

                    plt.show()

    result_cash_in = pd.DataFrame(rows, columns=['Model', 'partitions_techniques','#_partitions', 'order','RMSE_Valid', 'MAPE_Valid', 'U_Valid'])
    pd.set_option('max_colwidth', None)
    return result_cash_in, forecasted_values_valid

Weekly_Cash_in_models, forecasts_df_valid= Cash_in(train_set, valid_set)

To investigate further, I manually computed the forecasted value on my validation set using this formula coming from #25:

`mse = np.mean((Forecast_valid[:-1] - y_valid[model.order:])**2) 
rmse = np.sqrt(mse)`

Surprisingly, when computing it by hand using my forecasted array as is, it gave me a different RMSE of 1.04. Trying to figure out what was going on, I decided to add a none to the first observations of my forecast using:

                    for k in np.arange(order):
                        forecasted_values_valid.insert(0,None)  

which effectively shifted the forecast array one position to the right. After doing this, I recalculated the RMSE and got a value of 2.34, much close to the 2.32 using get_point_statistic.
It turns out that the issue was caused by me not inserting none in the forecast array at the index [model.order:] (in my case, at index = 1). I didn't insert none for orders lower than 1 because I was following an example from one of your notebooks: [Link to the notebook].

for order in np.arange(1,4):
  part = Grid.GridPartitioner(data=y, npart=10)
  model = hofts.HighOrderFTS(order=order, partitioner=part)
  model.fit(y)
  forecasts = model.predict(y)
  if order > 1:
    for k in np.arange(order):
      forecasts.insert(0,None)

I've noticed some "contradictory" information while going through various notebooks and the pyFTS tutorial ([Link to the tutorial](https://sbic.org.br/lnlm/wp-content/uploads/2021/12/vol19-no2-art3.pdf)). Some sources suggest that we need to insert none from [model.order:] even at order 1.
As I understand it, the parameter "order" represents the minimum number of lags used to predict the next observations. In the case of an order of 1, you can't have a predicted value at the first observation of the array since the first observation is used to predict the second one.
Capture d’écran, le 2023-07-28 à 12 16 11
Capture d’écran, le 2023-07-28 à 12 14 38

When examining one of the notebooks (picture below), it seems that not assigning none to the observations at positions [model.order:] causes the plots of fitted values to shift by 2 to the left. Thus rendering the graph invalid. I believe this could be what is happening in the Chen Conventional notebook.

Capture d’écran, le 2023-07-28 à 12 27 44

I would greatly appreciate some clarification on why some notebooks recommend assigning none at the position [model.order:], while others do not. It's a bit confusing, especially when different examples use different manipulations on the same model.

Thank you for your, and I'm looking forward to resolving this confusion of mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant