-
- Description:
Gensim - Python Framework for Vector Space Modelling
Gensim is a Python library for Vector Space Modelling with very large corpora. Target audience is the Natural Language Processing (NLP) community.
Features
All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM),
Intuitive interfaces
easy to plug in your own input corpus/datastream (trivial streaming API)
easy to extend with other Vector Space algorithms (trivial transformation API)
Efficient implementations of popular algorithms, such as online Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections
Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.
extensive documentation and tutorials
Reference example
>>> from gensim import corpora, models, similarities >>> >>> # load corpus iterator from a Matrix Market file on disk >>> corpus = corpora.MmCorpus('/path/to/corpus.mm') >>> >>> # initialize a transformation (Latent Semantic Indexing with 200 latent dimensions) >>> lsi = models.LsiModel(corpus, num_topics=200) >>> >>> # convert the same corpus to latent space and index it >>> index = similarities.MatrixSimilarity(lsi[corpus]) >>> >>> # perform similarity query of another vector in LSI space against the whole corpus >>> sims = index[query]
- Changes to previous version:
- numerous fixes to performance and stability
- faster document similarity queries
- document similarity server
- full change set here
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Agnostic
- Tags: Latent Semantic Analysis, Latent Dirichlet Allocation, Svd, Random Projections, Tfidf
- Archive: download here
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.