Gaussian Mixture Modeling Software

Introduction:

Gaussian Mixture Modeling Software is a C implementation of the Expectation-Maximization (EM) algorithm for fitting mixtures of Gaussians (GM) to multivariate data. In addition to the basic algorithm, the code automatically performs multiple random starts to prevent finding only locally optimal parameters. The Cross-Validation (CV) option is available for finding the optimal model structure. Data is split into two disjoint subsets: training subset and test subset. Each of the models is fitted on training set and evaluated on the test set via the log-likelihood (out-of-sample). The size of training/test datasets and number of CV iterations can be specified.

How to use the code:

The specifications for the algorithm are specified in a file called input.txt, e.g., number of restarts for EM, values of k (number of clusters) to fit, convergence criteria, etc. The data are provided in a simple ascii file with n rows and p columns, one row per p-dimensional observation: the data values on each row are assumed to be real-valued and can be separated by an arbitrary number of blank spaces. The algorithm saves its results in a text file, the name of which is specified in input.txt.

Limitations:

There are several limitations of the current code published on the web.

  • Very limited documentation!
  • The initialization is performed by k-means and there is currently no way to use alternate initialization.
  • Minimum value of covariance matrices (parameter for preventing singularities) is the same for all dimensions*.
  • The ``covariance shape" cannot be specified. This would allow for decoupling of certain dimensions*.
  • There might be bugs and/or other unexpected problems in versions that involve parsing and basic file I/O*.

* works in Matlab for Windows95 version (i.e. it is not a limitation in the Matlab version).

Disclosure:

Use the code at your own risk. It is free for (and only for) research and educational use. If you intend to use the code, please send us an e-mail at icadez@ics.uci.edu or smyth@ics.uci.edu. Also, please report any bugs and/or problems that you might have with the code.

Compatibility:

There are three versions of the code (last updated 07/08/99):

Author:

Igor Cadez,
Department of Information and Computer Science,
University of California,
Irvine, CA 92697,

 


Information and Computer Science
University of California, Irvine CA 92717-3425
Last modified: 07/08/99, by Igor Cadez