Gaussian Mixture Modeling Software is a C implementation of the Expectation-Maximization (EM) algorithm for fitting mixtures of Gaussians (GM) to multivariate data. In addition to the basic algorithm, the code automatically performs multiple random starts to prevent finding only locally optimal parameters. The Cross-Validation (CV) option is available for finding the optimal model structure. Data is split into two disjoint subsets: training subset and test subset. Each of the models is fitted on training set and evaluated on the test set via the log-likelihood (out-of-sample). The size of training/test datasets and number of CV iterations can be specified.
How to use the code:The specifications for the algorithm are specified in a file called input.txt, e.g., number of restarts for EM, values of k (number of clusters) to fit, convergence criteria, etc. The data are provided in a simple ascii file with n rows and p columns, one row per p-dimensional observation: the data values on each row are assumed to be real-valued and can be separated by an arbitrary number of blank spaces. The algorithm saves its results in a text file, the name of which is specified in input.txt.
There are several limitations of the current code published on the web.
* works in Matlab for Windows95 version (i.e. it is not a limitation in the Matlab version).
Use the code at your own risk. It is free for (and only for) research and educational use. If you intend to use the code, please send us an e-mail at firstname.lastname@example.org or email@example.com. Also, please report any bugs and/or problems that you might have with the code.
There are three versions of the code (last updated 07/08/99):
and Computer Science
University of California, Irvine CA 92717-3425
Last modified: 07/08/99, by Igor Cadez