Skip to content

This package contains the code for executing clustering validity indices in Java by using K-means from Weka. The package includes the following clustering validity indices: Silhouette, Dunn, BD-Silhouette, BD-Dunn, Davies-Bouldin, Calinski-Harabasz, MaximumDiameter, SquaredDistance, AverageDistance, AverageBetweenClusterDistance, MinimumDistance.

Notifications You must be signed in to change notification settings

josemarialuna/smallDataIndex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

smallDataIndex

This package contains Weka Cluster algorithm with a complete list of indices that will help you to decide the optimal number of clusters that the dataset could have. The package includes the followings indices:

  • Silhouette
  • Dunn
  • BD-Silhouette [1]
  • BD-Dunn [1]
  • Davies-Bouldin
  • Calinski-Harabasz
  • MaximumDiameter
  • SquaredDistance
  • AverageDistance
  • AverageBetweenClusterDistance
  • MinimumDistance.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

This is a Maven Project with OpenCSV 2.4 and Weka 3.8.0 dependencies. Both of them are included in the pom.xml file in the repository.

<dependency>
    <groupId>au.com.bytecode</groupId>
    <artifactId>opencsv</artifactId>
    <version>2.4</version>
</dependency>

        
<dependency>
    <groupId>nz.ac.waikato.cms.weka</groupId>
    <artifactId>weka-stable</artifactId>
    <version>3.8.0</version>
</dependency>

Running

WekaCluster is the main class, and it includes 4 arguments that can be set from the code or directly when you execute the jar file:

  • Argument 0: minNumCluster: it is the minimum cluster number that the dataset is going to be tested.
  • Argument 1: maxNumCluster: it is the maximum cluster number that the dataset is going to be tested.
  • Argument 2: pathFile: it is the path of the input dataset. It must includes the complete pathfile.
  • Argument 3: outFile: it is the path of the output result file. It must includes the complete pathfile.
  • Argument 4: selector: you can choose between SMALLDATA, BIGDATA and ALL. SMALLDATA only execute smalldata indices. BIGDATA just execute BD-Silhouette and BD-Dunn indices from [1]. And ALL execute SMALLDATA and BIGDATA indices.
int minNumCluster = 2;
int maxNumCluster = 10;
int selector = SMALLDATA;

String fileName = "SmallDataset.csv";
String folderFile = "C:\\datasets\\";
String pathFile = folderFile + fileName;
String outFile = getFileNameOutput(selector, fileName);

For this configuration the application load a file called SmallDataset.csv in "C:/datasets" and the result file will be saved as "Results-SmallDataset.csv"in the application folder.

Execution example

If we preffer executing in a terminal using java we just have to:

java -jar smallDataIndices.jar 2 10 C:/datasets/SmallDataset.csv Results.csv ALL
java -jar smallDataIndices.jar 10 20 datasets/dataset.csv results.csv SMALLDATA

Built With

Authors

Acknowledgments

  • University of Waikato

References

[1] Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3

Please, cite as: Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3 (https://link.springer.com/article/10.1007%2Fs13748-017-0135-3)

About

This package contains the code for executing clustering validity indices in Java by using K-means from Weka. The package includes the following clustering validity indices: Silhouette, Dunn, BD-Silhouette, BD-Dunn, Davies-Bouldin, Calinski-Harabasz, MaximumDiameter, SquaredDistance, AverageDistance, AverageBetweenClusterDistance, MinimumDistance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages