Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use hyperparams #30

Open
ramdhan1989 opened this issue Sep 12, 2020 · 6 comments
Open

how to use hyperparams #30

ramdhan1989 opened this issue Sep 12, 2020 · 6 comments
Assignees
Labels

Comments

@ramdhan1989
Copy link

I am struggle to find guidance about how to use hyperparam modul such as grid search or evolutionary. anyone can share ?

thank you

@petroniocandido
Copy link
Collaborator

petroniocandido commented Sep 13, 2020

Hi @ramdhan1989

Thanks for your interest in our tool, and forgive-me for the long delay.

First of all, before hyperparameter optimization (hereafter called hyperopt), you should perform the time series analysis (such as ACF/PACF plots, tests of stationarity and scedasticity, etc). Hyperopt does not unuseful to know how your time-series data behaves.

The hyperparameter optimization of FTS is described here, and is called DEHO - Distributed Evolutionary Hyperparameter Optimization, but there are other methods then evolutionary in the library. The return of the method will be a dictionary with the best parameters found for forecasting the dataset using the selected FTS method (in the parameter fts_method).

Below a list of the implemented methods:

  • Grid Search (GS) is very accurate but also very computationally expensive.
from pyFTS.hyperparam import GridSearch
from pyFTS.models import hofts
from pyFTS.data import TAIEX

datasetname = 'TAIEX'
dataset = TAIEX.get_data()

#The list of hyperparameters search spaces
hyperparams = {
    'order': [1, 2, 3],
    'partitions': np.arange(10,100,3),
    'partitioner': [1,2],   #GridSearch, EntropySearch, ...
    'mf': [1, 2, 3],    #Triangular, Trapezoidal and Gaussian
    'lags': np.arange(2, 7, 1),  # The lag indexes
    'alpha': np.arange(.0, .5, .05)  #Alpha Cut
}

GridSearch.execute(
        hyperparams,              #A dictionary containing the search spaces for each hyperparameter
        datsetname,                 #Just the name of your dataset 
        dataset,                        #Your time series data (list or np.ndarray 1d)
        fts_method=hofts.WeightedHighOrderFTS,     # the FTS method you want to optimize [only univariate methods]
        window_size=10000,    #The length of the data window for the Sliding Window Cross Validation method
        train_rate=.9,                #The proportion of the data window that will be used for training, the remaining will be used for test
        increment_rate=.3,       #The sliding increment the Sliding Window Cross Validation method
        database_file='hyperopt.db'   #A sqlite database that will contain the log of the hyperopt process
)

There is no GridSearch implementation yet for multivariate methods.

  • Random Search (RS) is computationally cheap but may not correctly converge, so it is not the more accurate method. Currently RS is implemented only for MVFTS.

from pyFTS.hyperparam import mvfts as deho_mv
from pyFTS.models.multivariate import mvfts, wmvfts
from pyFTS.models.seasonal.common import DateTime
from pyFTS.data import Malaysia

dataset = Malaysia.get_dataframe()
dataset['time'] = pd.to_datetime(data["time"], format='%m/%d/%y %I:%M %p')

explanatory_variables =[
    {'name': 'Temperature', 'data_label': 'temperature', 'type': 'common'},
    {'name': 'Daily', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.minute_of_day, 'npart': 24 },
    {'name': 'Weekly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_week, 'npart': 7 },
    {'name': 'Monthly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_month, 'npart': 4 },
    {'name': 'Yearly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_year, 'npart': 12 }
]

target_variable = {'name': 'Load', 'data_label': 'load', 'type': 'common'}

deho_mv.random_search(
              datsetname,                 #Just the name of your dataset 
              dataset,                        #Your time series data (pd.DataFrame)
              npop=200,                   #Size of population of the RS
              mgen=70,                    #Number of iterations of the RS
              fts_method=wmvfts.WeightedMVFTS,   #The multivariate FTS method to optimize
              variables=explanatory_variables,           #The list of exogenous/explanatory variables
              target_variable=target_variable,             #The endogenous/target variable
              window_size=10000,    #The length of the data window for the Sliding Window Cross Validation method
              train_rate=.9,                #The proportion of the data window that will be used for training, the remaining will be used for test
             increment_rate=.3,       #The sliding increment the Sliding Window Cross Validation method
              )
  • Genetic Algorithm (GA) is between GS and RS, both in accuracy and computational cost.
from pyFTS.hyperparam import Evolutionary
from pyFTS.models import hofts
from pyFTS.data import TAIEX

datasetname = 'TAIEX'
dataset = TAIEX.get_data()

ret = Evolutionary.execute(
        datsetname,                 #Just the name of your dataset 
        dataset,                        #Your time series data (list or np.ndarray 1d)
        fts_method=hofts.WeightedHighOrderFTS,     # the FTS method you want to optimize [only univariate methods]
        ngen=30,                      #Number of generations, the number of iterations of the GA
        npop=20,                      #The size of population of the GA
        psel=0.6,                      #Probability of selection  of the GA
        pcross=.5,                    #Probability of crossover  of the GA
        pmut=.3,                       #Probability of mutation  of the GA
        window_size=10000,    #The length of the data window for the Sliding Window Cross Validation method
        train_rate=.9,                #The proportion of the data window that will be used for training, the remaining will be used for test
        increment_rate=.3,       #The sliding increment the Sliding Window Cross Validation method
        experiments=1,             #Number of hyperopt experiments to perform
        database_file='hyperopt.db'   #A sqlite database that will contain the log of the hyperopt process
)

Please, do not hesitate to get in touch if you have any questions.

Best regards

@ramdhan1989
Copy link
Author

ramdhan1989 commented Sep 14, 2020

Thanks, all those three method work !
after executing hyperparameter optimization, does the model fitted automatically using the best params ? or we need to take the value from the output dict and fit the model ?
Would you mind elaborating more about the dict ? I am confused the values belong to which parameter ? from your code using GA :
Experiment 0 Evaluating initial population 1600098526.9596627 GENERATION 0 1600098526.9596627 WITHOUT IMPROVEMENT 1 GENERATION 1 1600098526.9606583 WITHOUT IMPROVEMENT 2 GENERATION 2 1600098526.9626496 WITHOUT IMPROVEMENT 3 GENERATION 3 1600098526.963645 WITHOUT IMPROVEMENT 4 GENERATION 4 1600098526.9656367 WITHOUT IMPROVEMENT 5 GENERATION 5 1600098526.9666321 WITHOUT IMPROVEMENT 6 GENERATION 6 1600098526.9686234 WITHOUT IMPROVEMENT 7 ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'rmse', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'size', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'time', 0.010952949523925781)

below is the return dict :
{'alpha': 0.5, 'f1': inf, 'f2': inf, 'lags': [2, 6, 7], 'mf': 1, 'npart': 40, 'order': 3, 'partitioner': 2, 'rmse': inf, 'size': inf, 'time': 0.010952949523925781}

@petroniocandido
Copy link
Collaborator

Hi @ramdhan1989

Using this dictionary you can build a model with this code:

from pyFTS.hyperparam import Evolutionary

model = Evolutionary.phenotype(
     dictionary,   #the result of the hyperparameter method
     train,            #The train dataset
     fts_method  #the FTS method
)

Best regards

@ramdhan1989
Copy link
Author

well thanks a lot @petroniocandido . does the hyperparams optimization search the best data transformation as well ? such as how many lags for differential ? or may be what kind of transformations is the best for the problem ?

thank you

@ramdhan1989
Copy link
Author

Hi @petroniocandido , how can I get stable prediction using GA ? every time I run it will result different values. do you have suggestion ?

@ramdhan1989
Copy link
Author

Hi @petroniocandido , I come back to try using this package. Just want to clarify several things :

  1. how to use Transformation differential into hyperparam optimization ?

  2. using evolutionary, I got rmse "nan". is it good ?
    image

  3. is it possible to use other eval metric ? such as rmsle (root mean sq log error) ?

appreciate for your answers

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants