About:
Python Framework for Vector Space Modelling that can handle unlimited datasets (streamed input, online algorithms work incrementally in constant memory).
Changes:
-
added the "hashing trick" (by Homer Strong)
-
support for adding target classes in SVMlight format (by Corrado Monti)
-
fixed problems with global lemmatizer object when running in parallel on Windows
-
parallelization of Wikipedia processing + added script version that lemmatizes the input documents
-
added class method to initialize Dictionary from an existing corpus (by Marko Burjek)
|