mloss.org gensimhttp://mloss.orgUpdates and additions to gensimenSun, 09 Dec 2012 13:15:16 -0000gensim 0.8.6http://mloss.org/software/view/273/<html><h1>Gensim - Python Framework for Vector Space Modelling</h1> <p>Gensim is a Python library for <em>Vector Space Modelling</em> with very large corpora. Target audience is the <em>Natural Language Processing</em> (NLP) community. </p> <h2>Features</h2> <ul> <li><p>All algorithms are <strong>memory-independent</strong> w.r.t. the corpus size (can process input larger than RAM), </p> </li> <li><p><strong>Intuitive interfaces</strong> </p> </li> <li><p>easy to plug in your own input corpus/datastream (trivial streaming API) </p> </li> <li><p>easy to extend with other Vector Space algorithms (trivial transformation API) </p> </li> <li><p>Efficient implementations of popular algorithms, such as online <strong>Latent Semantic Analysis</strong>, <strong>Latent Dirichlet Allocation</strong> or <strong>Random Projections</strong> </p> </li> <li><p><strong>Distributed computing</strong>: can run <em>Latent Semantic Analysis</em> and <em>Latent Dirichlet Allocation</em> on a cluster of computers. </p> </li> <li><p>extensive <a href="http://nlp.fi.muni.cz/projekty/gensim/">documentation and tutorials</a> </p> </li> </ul> <h2>Reference example</h2> <pre><code>&gt;&gt;&gt; from gensim import corpora, models, similarities &gt;&gt;&gt; &gt;&gt;&gt; # load corpus iterator from a Matrix Market file on disk &gt;&gt;&gt; corpus = corpora.MmCorpus('/path/to/corpus.mm') &gt;&gt;&gt; &gt;&gt;&gt; # initialize a transformation (Latent Semantic Indexing with 200 latent dimensions) &gt;&gt;&gt; lsi = models.LsiModel(corpus, num_topics=200) &gt;&gt;&gt; &gt;&gt;&gt; # convert the same corpus to latent space and index it &gt;&gt;&gt; index = similarities.MatrixSimilarity(lsi[corpus]) &gt;&gt;&gt; &gt;&gt;&gt; # perform similarity query of another vector in LSI space against the whole corpus &gt;&gt;&gt; sims = index[query] </code></pre></html>Radim RehurekSun, 09 Dec 2012 13:15:16 -0000http://mloss.org/software/rss/comments/273http://mloss.org/software/view/273/latent semantic analysislatent dirichlet allocationsvdrandom projectionstfidf