Latent Dirichlet Allocation and Its Variants
Updated: March 2012
This page contains a collection of links to papers, software, and data sets on the subject of Latent Dirichlet Allocation (LDA) and its variants. This is part of what I learned through participating the reading group at Lehigh's Web Understanding, Modeling, and Evaluation Lab. I am grateful to the WUME group, in particular, Liangjie Hong, Zaihan Yang, Dawei Yin, and Professor Brian Davison, who provided invaluable help.
Papers
 E. M. Airoldi, D. M. Blei, E. P. Xing, and S. E. Fienberg.
A latent mixedmembership model
for relational data.
In ACM SIGKDD Workshop on Link Discovery: Issues, Approaches and
Applications, 2005.
Accessed March 31, 2012 at
http://www.cs.cmu.edu/~epxing/papers/linkkdd0512.pdf.
 Adam Gyenge, Janne Sinkkonen, and Andrs A. Benczur. (2010).
An efficient block model for clustering sparse graphs.
In Proceedings of the Eighth Workshop on Mining and Learning with Graphs (MLG '10). ACM, New York, NY, USA, 6269.
DOI=10.1145/1830252.1830261
http://doi.acm.org/10.1145/1830252.1830261
Accessed March 29, 2012 at
http://users.cis.fiu.edu/~lzhen001/activities/KDD_USB_key_2010/workshops/W01%20MLG2010/p62gyenge.pdf.
 Unsupervised Learning by Probabilistic Latent Semantic Analysis. Thomas Hofmann. Machine Learning (42) pp. 177196, 2001
 Latent Dirichlet Allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research (3) pp. 9931022, 2003.

The Doubly Correlated Nonparametric Topic Model. Dae Il Kim,
Erik B. Sudderth.
and
[supplemental]. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

Hierarchical Topic Modeling for Analysis of TimeEvolving Personal Choices
and [supplemental] Xianxing Zhang, David Dunson, Lawrence Carin.
In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

Complexity of Inference in Latent Dirichlet Allocation and
[supplemental].
David Sontag, Dan Roy.
In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

Improving Topic Coherence with Regularized Topic Models
David Newman, Edwin V. Bonilla, Wray Buntine.
In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.
 Hierarchically Supervised Latent Dirichlet Allocation
Adler J. Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett.
In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.
 Janne Sinkkonen, Janne Aukia, and Samuel Kaski.
Inferring vertex properties from topology
in large networks. In Working Notes of the 5th International Workshop
on Mining and Learning with Graphs (MLG'07), Florence, Italy, 2007.
Accessed March 30, 2012 at
http://eprints.pascalnetwork.org/archive/00003559/01/mlg0713sinkkonenfinal.pdf.

TopicLink LDA: Joint Models of Topic and Author Community. Y Liu, A, Mizil, W. Gryc. In Proceedings of International Conference on Machine Learning(ICML), 2009.
 Topic and role discovery in social networks with experiments on Enron and academic email by A McCallum, X. Wang, and A. CorradaEmmanuel. In Journal of Artificial Intelligence Research, 30:249272, 2007.
 Limin Yao, David Mimno, and Andrew McCallum. (2009).
Efficient methods for topic model inference on streaming document collections.
In Proceedings of the 15th ACM SIGKDD international conference on
Knowledge discovery and data mining (KDD '09). ACM, New York, NY,
USA, 937946.
DOI=10.1145/1557019.1557121 http://doi.acm.org/10.1145/1557019.1557121
Accessed March 29, 2012 at
http://people.cs.umass.edu/~mimno/papers/fasttopicmodel.pdf.
Tutorial and Introduction Materials
 Parameter Estimation for Text Analysis (version 2.4, 200808) by Gregor Heinrich of University of Leipzig, Germany. Accessed 20120307.
 Parameter Estimation for Text Analysis (version 2.9, 2009) by Gregor Heinrich of University of Leipzig, Germany. Accessed 20120307.

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details by Yi Wang. Dated August 2008, accessed 20120308.
 Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes
for Collapsed Gibbs Sampling by Bob Carpenter of LingPipe, Inc. Accessed 20120307.
 The Expectation Maximization Algorithm  A Short Tutorial by Sean Borman, dated 2009, accessed 20120310.
 Notes on Expectation Maximization Algorithm by Liangjie Hong, dated February 2012, accessed 20120310.
 Gibbs Sampling in the generative model of Latent Dirichlet Allocation by Tom Griffiths, dated 2002, accessed 20120307. (The original website for the page at Standford's Psychology Department is no longer accessible.)
 David Blei's page on Topic Modeling that includes some introductory materials and a list of software by Blei and his group.
Software
 MALLET: MAchine Learning for LanguagE Toolkit at UMass. http://mallet.cs.umass.edu/index.php
 Tom Haines: Assorted python machine learning stuff. http://code.google.com/p/haines/. In particular, the list includes a few LDA implementations.
Datasets