- Description:
Non-parametric topic models implemented using efficient Gibbs sampling. Early theory from the ECML-PKDD 2011 paper cited.
Coded in C with no other dependencies. No Chinese restaurant processes or stick breaking so fast (non-parametric methods 1-3 times slower than regular LDA with Gibbs, and marginal increase in memory). Input can be LdaC format, docword format, various Matlab style formats. Implements HDP-LDA ala Teh, Jordan Beal and Blei (2006), HPYP-LDA, symmetric-symmetric, symmetric-asymmetric, asymmetric-symmetric, and asymmetric-symmetric priors ala Wallach, Mimno and McCallum (2009) with Pitman-Yor or Dirichlet processes. Burstiness modelling ala Doyle and Elkan (2009) can combine with any model above for even better performance. Full hyper-parameter fitting, or setting initially.
Estimation of various vectors (document and topic vectors). Diagnostics, control, restarts, test likelihood via document completion. Coherence calculations on results using PMI and normalised PMI.
- Changes to previous version:
Added example on using burstiness.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Linux, Macosx, Windows Under Cygwin
- Data Formats: Ascii
- Tags: Topic Modeling, Nonparametric Bayes
- Archive: download here
- Wray Buntine (on June 24, 2014, 06:21:54)
- Noticed in this update hyper-parameter fitting of "beta" when using -B doesn't update the parameter. I'll have a new version out shortly along with a few other improvements to fix this.
- Wray Buntine (on June 24, 2014, 06:29:59)
- Get more details about the theory from the [KDD 2014 paper](https://www.researchgate.net/publication/263162682_Experiments_with_Non-parametric_Topic_Models "Experiments with Non-parametric Topic Models"). Will be presenting in New York!
- Wray Buntine (on August 22, 2014, 23:19:31)
- Tip for the speed freaks - diminishing returns after 10-16 cores due to memory thrashing. We keep it to 8 cores. Also, am carefully studying Aaron Li's brilliant KDD 2014 paper to see about transferring his speedups into hca.
Leave a comment
You must be logged in to post comments.