-
- Description:
Library for Vector Space Modelling with very large corpora. Target audience is the Natural Language Processing (NLP) community.
Features:
- all algorithms are memory-independent w.r.t. the corpus size
- low API learning curve, simple interfaces
- efficient implementations of popular algorithms, such as incremental online Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections
- can run Latent Semantic Analysis on a cluster of computers (distributed computing)
- Changes to previous version:
- improved Latent Semantic Analysis (incremental SVD) performance: factorizing the English Wikipedia (3.1m documents) now takes 14h even in serial mode (i.e., on a single computer)
- several minor optimizations and bug fixes
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Agnostic
- Tags: Latent Semantic Analysis, Latent Dirichlet Allocation, Svd, Random Projections, Tfidf
- Archive: download here
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.