Skip to content

smason/mdipp

Repository files navigation

MDI++ — Multiple Dataset Integration, reimplemented

MDI is a method for performing integrative clustering of genomic datasets. By "integrative" we mean that it is able to share clustering correlations across multiple related datasets. The clustering model is probabilistic, and it is therefore able to find a "natural" number of clusters to represent your data. Furthermore a Bayesian framework is used, so the output of analysis is a Monte Carlo chain targeting the posterior distribution of the model parameters given the data.

To see the simple case of clustering a single normally distributed dataset, run:

$ mdi++ N demo/normgam.csv > mcmc_01.csv

As can hopefully be seen from this command line, the list of input files is specified and prefixed by their "data type", with standard output being piped to a file where it can be used for plotting or subsequent analysis. More details are available in a document on using MDI and an example analysis.

The majority of the implementation of MDI++ is in C++ with compute intensive portions employing CUDA. A number of R scripts are provided for the purposes of plotting and extracting a canonical clustering. As with most Unix programs, a synopsis of command line options can be displayed by running:

$ mdi++ --help

There are a number of R scripts in the scripts subdirectory that can be used to load the CSV files MDI++ generates, the demo subdirectory contains example data and analysis scripts, all further documentation is included in the docs subdirectory.

Building

This software has been primarily developed under Mac OSX and Ubuntu, so these are currently best supported. The software dependencies are a C++11 compiler (such as GCC or Clang) and the Boost and Eigen libraries. These can be installed under OSX, assuming you are not using CUDA features, by running:

$ brew install --c++11 boost eigen pkg-config

or under Ubuntu by running:

$ sudo apt-get install libboost-all-dev libeigen3-dev

Once these dependencies have been installed, one should be able to type make and the software will build. There are some system specific dependencies that are defined in config.mk that may need to be tweaked depending on your system, and for more details see docs/install.md.

About

MDI++ Bayesian Data Integration of Biological Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published