Latent Dirichlet Allocation and Its Variants

Updated: March 2012

This page contains a collection of links to papers, software, and data sets on the subject of Latent Dirichlet Allocation (LDA) and its variants. This is part of what I learned through participating the reading group at Lehigh's Web Understanding, Modeling, and Evaluation Lab. I am grateful to the WUME group, in particular, Liangjie Hong, Zaihan Yang, Dawei Yin, and Professor Brian Davison, who provided invaluable help.

Papers

  1. E. M. Airoldi, D. M. Blei, E. P. Xing, and S. E. Fienberg. A latent mixed-membership model for relational data. In ACM SIGKDD Workshop on Link Discovery: Issues, Approaches and Applications, 2005.
    Accessed March 31, 2012 at http://www.cs.cmu.edu/~epxing/papers/linkkdd05-12.pdf.

  2. Adam Gyenge, Janne Sinkkonen, and Andrs A. Benczur. (2010). An efficient block model for clustering sparse graphs. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs (MLG '10). ACM, New York, NY, USA, 62-69. DOI=10.1145/1830252.1830261 http://doi.acm.org/10.1145/1830252.1830261
    Accessed March 29, 2012 at http://users.cis.fiu.edu/~lzhen001/activities/KDD_USB_key_2010/workshops/W01%20MLG2010/p62-gyenge.pdf.

  3. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Thomas Hofmann. Machine Learning (42) pp. 177-196, 2001

  4. Latent Dirichlet Allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research (3) pp. 993-1022, 2003.

  5. The Doubly Correlated Nonparametric Topic Model. Dae Il Kim, Erik B. Sudderth. and [supplemental]. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

  6. Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices and [supplemental] Xianxing Zhang, David Dunson, Lawrence Carin. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

  7. Complexity of Inference in Latent Dirichlet Allocation and [supplemental].  David Sontag, Dan Roy. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

  8. Improving Topic Coherence with Regularized Topic Models  David Newman, Edwin V. Bonilla, Wray Buntine. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

  9. Hierarchically Supervised Latent Dirichlet Allocation  Adler J. Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011.

  10. Janne Sinkkonen, Janne Aukia, and Samuel Kaski. Inferring vertex properties from topology in large networks. In Working Notes of the 5th International Workshop on Mining and Learning with Graphs (MLG'07), Florence, Italy, 2007.
    Accessed March 30, 2012 at http://eprints.pascal-network.org/archive/00003559/01/mlg07-13-sinkkonen-final.pdf.

  11. Topic-Link LDA: Joint Models of Topic and Author Community. Y Liu, A, Mizil, W. Gryc. In Proceedings of International Conference on Machine Learning(ICML), 2009.

  12. Topic and role discovery in social networks with experiments on Enron and academic email by A McCallum, X. Wang, and A. Corrada-Emmanuel. In Journal of Artificial Intelligence Research, 30:249-272, 2007.

  13. Limin Yao, David Mimno, and Andrew McCallum. (2009). Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 937-946. DOI=10.1145/1557019.1557121 http://doi.acm.org/10.1145/1557019.1557121
    Accessed March 29, 2012 at http://people.cs.umass.edu/~mimno/papers/fast-topic-model.pdf.

Tutorial and Introduction Materials

  1. Parameter Estimation for Text Analysis (version 2.4, 2008-08) by Gregor Heinrich of University of Leipzig, Germany. Accessed 2012-03-07.

  2. Parameter Estimation for Text Analysis (version 2.9, 2009) by Gregor Heinrich of University of Leipzig, Germany. Accessed 2012-03-07.

  3. Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details by Yi Wang. Dated August 2008, accessed 2012-03-08.

  4. Integrating Out Multinomial Parameters in Latent Dirichlet Allocation and Naive Bayes for Collapsed Gibbs Sampling by Bob Carpenter of LingPipe, Inc. Accessed 2012-03-07.

  5. The Expectation Maximization Algorithm -- A Short Tutorial by Sean Borman, dated 2009, accessed 2012-03-10.

  6. Notes on Expectation Maximization Algorithm by Liangjie Hong, dated February 2012, accessed 2012-03-10.

  7. Gibbs Sampling in the generative model of Latent Dirichlet Allocation by Tom Griffiths, dated 2002, accessed 2012-03-07. (The original website for the page at Standford's Psychology Department is no longer accessible.)

  8. David Blei's page on Topic Modeling that includes some introductory materials and a list of software by Blei and his group.

Software

  1. MALLET: MAchine Learning for LanguagE Toolkit at UMass. http://mallet.cs.umass.edu/index.php

  2. Tom Haines: Assorted python machine learning stuff. http://code.google.com/p/haines/. In particular, the list includes a few LDA implementations.

Datasets