Topic Modeling Best Practices

Project Overview

Literature Review

Compare Number of Topics

Compare Alpha Values

Compare Noun and Regular Corpus Models

Discussion

Conclusions

Appendix 1: Documentation of Corpus Preparation

Appendix 2: Documentation of Model Training Code

References

View the Project on GitHub msaxton/topic-model-best-practices

Best Practices for Topic Modeling

by Micah D. Saxton

About

This project builds, tests, and analyses a number of different topic models based on the Journal of Biblical Literature in order to make recommendations to the non-expert on how a useful topic model should be built. It is submitted to the faculty of the Library and Information Science Program at the University of Denver in partial fulfillment of the degree of Master of Library and Information Science.

Contents

  1. Project Overview
  2. Literature Review
  3. Analysis: Studying the properties of topic models with different numbers of topics
  4. Analysis: Studying the properties of topic models with different alpha values
  5. Analysis: Comparing a regular corpus with a noun-only corpus
  6. Discussion
  7. Conclusions
  8. Appendix 1: Documentation of corpus preparation
  9. Appendix 2: Documentation of model training code
  10. References

Acknowledgments

I would like to thank Peter Organisciak Ph.D. and Krystyna Matusiak Ph.D. for their willingness to oversee and comment upon this project. I am also grateful for JSTOR labs who generously provided the dataset upon which this project is based. Many thanks also to the folks of experimentalhumanities@iliff who have inspired in me a love of coding and all things digital. Finally, a special thanks for Amelia Stubblefield who has been a constant source of encouragement through this project.