Skip to content

fschur/Missing-Data-Imputation-Methods-Performance-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Missing-Data-Imputation-Methods-Performance-Comparison

The data imputation methods MissForest, GAIN, MICE, MICE-NN and MIWAE are tested on two UCI datasets (Dataset for Sensorless Drive Diagnosis Data Set, Page Blocks Classification Dataset). MICE-NN is a modified version of MICE, where instead of linear regresssion fully connected neural networks are used. The tests are done by taking the complete dataset (without missing values) introducing either MAR or MCAR missingness with the desired missing rate and then using the imputation methods to impute the missing values. Since the correct values are known, the real MSE can be computed. To test other datasets, save the dataset as a 2-dim numpy array in the folder data. Now set dataset = "name" when calling the imputation method, where your dataset in the folder data is named "name_y" and name_x.

MCAR missing values are introduced by dropping each value in the data independently with probability "p_miss". MAR missing values are introduced by summing over one third of each observation and dropping each value in the rest of the observation independently with a probability proportional to the computed sum. For this the variable "para" is used (for details see load_data in utils.py).

Requirements

The code requires Python 3.6 or later. Required packages are:

  • fanyimpute >= 0.5.3
  • mathplotlib >= 2.2.2
  • missingpy >= 0.2.0
  • numpy >= 1.16.2
  • pathlib >= 2.3.3
  • pickle
  • Pillow >= 5.4.1
  • pylab
  • scipy >= 1.2.1
  • sklearn
  • tensorflow >= 1.14
  • tensorflow_probability >=0.7.0
  • torch >= 1.0.1
  • torchvision >= 0.2.2
  • tqdm >= 4.31.1

Releases

No releases published

Packages

No packages published

Languages