-
- Description:
Gensim - Python Framework for Vector Space Modelling
Gensim is a Python library for Vector Space Modelling with very large corpora. Target audience is the Natural Language Processing (NLP) community.
Features
All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM),
Intuitive interfaces
easy to plug in your own input corpus/datastream (trivial streaming API)
easy to extend with other Vector Space algorithms (trivial transformation API)
Efficient implementations of popular algorithms, such as online Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections
Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.
extensive documentation and tutorials
Reference example
>>> from gensim import corpora, models, similarities >>> >>> # load corpus iterator from a Matrix Market file on disk >>> corpus = corpora.MmCorpus('/path/to/corpus.mm') >>> >>> # initialize a transformation (Latent Semantic Indexing with 200 latent dimensions) >>> lsi = models.LsiModel(corpus, num_topics=200) >>> >>> # convert the same corpus to latent space and index it >>> index = similarities.MatrixSimilarity(lsi[corpus]) >>> >>> # perform similarity query of another vector in LSI space against the whole corpus >>> sims = index[query]
- Changes to previous version:
- added the "hashing trick" (by Homer Strong)
- support for adding target classes in SVMlight format (by Corrado Monti)
- fixed problems with global lemmatizer object when running in parallel on Windows
- parallelization of Wikipedia processing + added script version that lemmatizes the input documents
- added class method to initialize Dictionary from an existing corpus (by Marko Burjek)
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Agnostic
- Tags: Latent Semantic Analysis, Latent Dirichlet Allocation, Svd, Random Projections, Tfidf
- Archive: download here
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.