Project details for gensim

Logo gensim 0.7.0

by Radim - August 28, 2010, 05:58:31 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions


Library for Vector Space Modelling with very large corpora. Target audience is the Natural Language Processing (NLP) community.


  • all algorithms are memory-independent w.r.t. the corpus size
  • low API learning curve, simple interfaces
  • efficient implementations of popular algorithms, such as incremental online Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections
  • can run Latent Semantic Analysis on a cluster of computers (distributed computing)
Changes to previous version:
  • improved Latent Semantic Analysis (incremental SVD) performance: factorizing the English Wikipedia (3.1m documents) now takes 14h even in serial mode (i.e., on a single computer)
  • several minor optimizations and bug fixes
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Platform Independent
Data Formats: Agnostic
Tags: Latent Semantic Analysis, Latent Dirichlet Allocation, Svd, Random Projections, Tfidf
Archive: download here


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.