mloss.org MALLEThttp://mloss.orgUpdates and additions to MALLETenMon, 24 Aug 2009 23:10:14 -0000MALLET 2.0-rc4http://mloss.org/software/view/147/<html><p>MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. It includes tools for document classification: efficient routines for converting text to "features", algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using commonly used metrics. </p> <p>In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers. </p> <p>Topic models are useful for analyzing large collections of unlabeled text. The MALLET topic modeling toolkit contains efficient, sampling-based implementations of Latent Dirichlet Allocation, Pachinko Allocation, and Hierarchical LDA. </p> <p>Many of the algorithms in MALLET depend on numerical optimization. MALLET includes an efficient implementation of Limited Memory BFGS, among many other optimization methods. </p> <p>In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors. </p></html>Andrew McCallumMon, 24 Aug 2009 23:10:14 -0000http://mloss.org/software/rss/comments/147http://mloss.org/software/view/147/sequence analysisclassificationclusteringtopic modelinginformation extraction