. 



The NIPS data set contains papers from the NIPS conferences between 1987 and 1999. The conference is characterized by contributions from a number of different research communities in the general area of learning algorithms. Full papers from the NIPS conference in Matlab format are available online at http://www.cs.toronto.edu/~roweis/data.html Our collection of NIPS papers contains D=1,740 papers with K=2,037 authors, a total of 2,301,375 word tokens and a vocabulary size of V=13,649 unique words. We divided the D=1,740 NIPS papers into a training set of 1,557 papers with a total of 2,057,729 words, and a test set of 183 papers of which 102 are singleauthored papers. We chose the test data documents such that each of the 2,037 authors of the NIPS collection authored at least one of the training documents. 