Non-parametric topic models implemented using efficient Gibbs sampling. Early theory from the ECML-PKDD 2011 paper cited.
Coded in C with no other dependencies. Input can be LdaC format, docword format, various Matlab style formats. Implements HDP-LDA, HPYP-LDA, symmetric-symmetric, symmetric-asymmetric, asymmetric-symmetric, and asymmetric-symmetric priors with Pitman-Yor or Dirichlet processes. Full hyper-parameter fitting, or setting initially. Special "turbo boost" function for even better performance. No Chinese restaurant processes so quite fast (non-parametric methods 1.5-3.0 times slower than regular LDA with Gibbs). Estimation of various vectors (document and topic vectors). Diagnostics, control, restarts, test likelihood via document completion. Coherence calculations on results using PMI and normalised PMI.
- Changes to previous version:
Added example on using burstiness.
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.