mloss.org hcahttp://mloss.orgUpdates and additions to hcaenTue, 26 Apr 2016 15:35:03 -0000hca 0.63http://mloss.org/software/view/527/<html><p>Non-parametric topic models implemented using efficient Gibbs sampling on multi-core. Experiments reported at KDD-2014 (see <a href="http://dl.acm.org/citation.cfm?id=2623330.2623691">ACM DL entry</a>) and early theory from the ECML-PKDD 2011 paper cited. See also <a href="http://TopicModels.org">TopicModels.ORG</a>. Project maintained on <a href="https://github.com/wbuntine">Github</a>. </p> <p>Coded in C with no other dependencies. With modern C++11 atomic operations supports multi-core. No Chinese restaurant processes or stick breaking so fast (non-parametric methods 1-3 times slower than regular LDA with Gibbs, and marginal increase in memory). Input can be LdaC format, docword format, various Matlab style formats. </p> <p>Implements HDP-LDA ala Teh, Jordan Beal and Blei (2006), HPYP-LDA, symmetric-symmetric, symmetric-asymmetric, asymmetric-symmetric, and asymmetric-symmetric priors ala Wallach, Mimno and McCallum (2009) with Pitman-Yor or Dirichlet processes. Burstiness modelling ala Doyle and Elkan (2009) can combine with any model above for even better performance. New normalised Gamma model with Indian buffet processes allows per-topic variance tuning and sparsity modelling. Full hyper-parameter fitting, or setting initially. </p> <p>Estimation of various vectors (document and topic vectors). Diagnostics, control, restarts, test likelihood via document completion. Coherence calculations on results using PMI and normalised PMI. PMI and NPMI data available on request. </p></html>wray buntineTue, 26 Apr 2016 15:35:03 -0000http://mloss.org/software/rss/comments/527http://mloss.org/software/view/527/topic modelingnonparametric bayesmulti core<b>Comment by Wray Buntine on 2014-06-24 06:21</b>http://mloss.org/comments/cr/14/527/#c712<p>Noticed in this update hyper-parameter fitting of "beta" when using -B doesn't update the parameter. I'll have a new version out shortly along with a few other improvements to fix this.</p> Wray BuntineTue, 24 Jun 2014 06:21:54 -0000http://mloss.org/comments/cr/14/527/#c712<b>Comment by Wray Buntine on 2014-06-24 06:29</b>http://mloss.org/comments/cr/14/527/#c713<p>Get more details about the theory from the <a href="https://www.researchgate.net/publication/263162682_Experiments_with_Non-parametric_Topic_Models" title="Experiments with Non-parametric Topic Models">KDD 2014 paper</a>. Will be presenting in New York!</p> Wray BuntineTue, 24 Jun 2014 06:29:59 -0000http://mloss.org/comments/cr/14/527/#c713<b>Comment by Wray Buntine on 2014-08-22 23:19</b>http://mloss.org/comments/cr/14/527/#c727<p>Tip for the speed freaks - diminishing returns after 10-16 cores due to memory thrashing. We keep it to 8 cores.</p> <p>Also, am carefully studying Aaron Li's brilliant KDD 2014 paper to see about transferring his speedups into hca.</p> Wray BuntineFri, 22 Aug 2014 23:19:31 -0000http://mloss.org/comments/cr/14/527/#c727