Skip to content

manuparra/DFST

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DFST algorithm

This algorithm implements DFST (Distributed FastShapelet Transform). DFST is the first time series classification algorithm developed for distributed environments (Spark). This algorithm performs a shapelet transform on a data set, trains a Random Forest model on that data, and uses this model to classify a test set that has also been transformed. DFST leverages the low computational complexity of the FastShapelet algorithm to extract these features in distributed environments efficiently.

Example (ml)

import org.apache.spark.mllib.classification._

var a= new DFST(train_file_name, // hdfs route to training file
                min_len, // minimum length for FastShapelet search algorithm
                max_len, // maximum length for FastShapelet search algorithm
                step, // amount of increase in the length of subsequences processed between iterations
                Num_partitions, // number of partitions in the training dataset (RDD).
                seed, // random seed
                tree_file_name, // output tree file of FastShapelet search algorithm
                time_file_name, // output time file  FastShapelet search algorithm
                R, // number of random projections
                top_k, // number of best candidate shapelets evaluated by iteration
                Classificator, // 0 for shapelet search only / 1 for DFST
                shapelet_file_name, //  output file with shapelets selected for shapelet transformation
                test_file_name, // hdfs route to test file
                model_file_name, // output file with Random Forest Model
                sc) // input SparkContext

val output = DFST.run() // generates the file f_scala_results. txt with the runtime records and results on the test dataset.

Running from spark-submit

Run DFST from spark-submit:

spark-submit --class org.apache.spark.mllib.classification.DFST DFST-1.1.jar \ 
       trainfile.txt \ 
       10 \ 
       100 \ 
       10 \ 
       1840 \ 
       26 \ 
       treefile.txt \ 
       timefiletxt \ 
       1 \ 
       10 \ 
       1 \ 
       outputfile_shapelets.txt \ 
       testfile.txt \ 
       outputfile_model.dat 

Releases

No releases published

Packages

No packages published

Languages

  • Scala 100.0%