The Author-Topic Model

The author-topic model is a generative model for authors and documents that reduces the generation of documents to a simple series of probabilistic steps. Each author is associated with a topics mixture and the choice of words of a collaborative paper is assumed to be the result of a mixture of the authors' topics mixtures. The model is applied to a collection of 1.7K NIPS conference papers and 160K CiteSeer abstracts.

This webpage contains an online query interface to the model that allows interactive exploration of queries such as the query what topics does a given author write about and other fun applications.

Most of the data currently presented in this webpage is extracted from a single MCMC sample. One solution of 300 topics from the CiteSeer dataset and one solution of 100 topics from the NIPs dataset (these two samples are available for queries at the browser).

*       The Data Sets: CiteSeer, NIPs 

*      100 Topics – one sample result from the NIPs

*      300 Topics – one sample result from the CiteSeer

 

Applications of the Author Topic Model to the CiteSeer:

*      Topic Trends Over Time

*      Assigning Topics and Authors to New Documents

*      Scoring Papers for Authors (Detecting the Most Surprising Papers for an Author)

 

 

References

Finding Scientific Topics.
T. Griffiths and M. Steyvers (2004).
Proceedings of the National Academy of Sciences

The Author-Topic Model for Authors and Documents
M. Rosen-Zvi, T. Griffiths, M. Steyvers, P. Smyth (2004)
Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI-04)

Probabilistic author-topic models for information discovery
M. Steyvers, P. Smyth, M. Rosen-Zvi, T. Griffiths  (2004)
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining 

Credits

This is a joint reaserch project by Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, Thomas Griffiths
Graduate Student: Chaitanya Chemudugunta
Programmers: Amnon Meyers, Momo Alhazzazi
Funding: This material is based upon work supported by the National Science
Foundation under Grant No. IIS-0083489 and by the Knowledge
Discovery and Dissemination (KD-D) Program. NSF KDD project.
Any opinions, findings and conclusions or recomendations expressed in
this material are those of the author(s) and do not necessarily reflect
the views of the National Science Foundation (NSF).

We would like to thank Steve Lawrence and C. Lee Giles for kindly providing us with the CiteSeer data used.

Last Updated: 2008-09-28   for comments and questions contact Michal Rosen-Zvi, email: michal at il.ibm.com