*    The NIPs Data Set

The NIPS data set contains papers from the NIPS conferences between 1987 and 1999. The conference is characterized by contributions from a number of different research communities in the general area of learning algorithms.

Full papers from the NIPS conference in Matlab format are available on-line at http://www.cs.toronto.edu/~roweis/data.html

Our collection of NIPS papers contains D=1,740 papers with K=2,037 authors, a total of 2,301,375 word tokens and a vocabulary size of V=13,649 unique words. We divided the D=1,740 NIPS papers into a training set of 1,557 papers with a total of 2,057,729 words, and a test set of 183 papers of which 102 are single-authored papers. We chose the test data documents such that each of the 2,037 authors of the NIPS collection authored at least one of the training documents.