Project details for jLDADMM

Logo jLDADMM 1.0

by dqnguyen - August 19, 2015, 12:52:36 CET [ Project Homepage BibTeX Download ]

view (2 today), download ( 1 today ), 0 subscriptions

Description:

jLDADMM: A Java package for the LDA and DMM topic models

The Java package jLDADMM is released to provide alternatives for topic modeling on normal or short texts. Probabilistic topic models, such as Latent Dirichlet Allocation (LDA) and related models, are widely used to discover latent topics in document collections. However, applying topic models for short texts (e.g. Tweets) is more challenging because of data sparsity and the limited contexts in such texts. One approach is to combine short texts into long pseudo-documents before training LDA. Another approach is to assume that there is only one topic per document.

jLDADMM provides implementations of the LDA topic model and the one-topic-per-document Dirichlet Multinomial Mixture (DMM) model (i.e. mixture of unigrams). These implementations of LDA and DMM use Gibbs sampling for inference. Furthermore, jLDADMM supplies a document clustering evaluation to compare topic models, using two common metrics of Purity and normalized mutual information.

See http://jldadmm.sourceforge.net/ for more details.

Changes to previous version:

Initial Announcement on mloss.org.

BibTeX Entry: Download
Supported Operating Systems: Cygwin, Linux, Windows
Data Formats: Txt
Tags: Lda, Latent Dirichlet Allocation, Dirichlet Multinomial Mixture Model, Dmm, Mixture Of Unigrams, Short Texts, Topic Model, Tweets
Archive: download here

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.