MLPACK is the first comprehensive scalable machine learning library. Developed by the Fundamental Algorithmic and Statistical Tools laboratory (FASTlab), MLPACK and its core functions library FASTlib are the much needed filling of an existing void. Previously, researchers had to either (a) settle for poorly-scaling collections of methods implemented for academic purposes, (b) hunt down the often difficult to find and difficult to apply yet fast code writen by algorithms' developers, or (c) reimplement solutions to their specific analysis problems from scratch. With MLPACK, we offer a fourth option, in which researchers may find all the methods they need designed favoring both speed and usability.
MLPACK currently includes a wide range of the following efficient algorithms:
$k$-nearest neighbor classifier.
Hidden Markov Models.
Information Maximization algorithm for ICA.
Kernel density estimation algorithm using series expansion.
Mixture of Gaussians using maximum likelihood and L2 error.
Naive Bayes classifier.
Series expansion library for Gaussian kernel in $O(p^D)$ and $O(D^p)$ expansions.
Support Vector Machine classifier and regression.
Sequential Minimal Optimization algorithm for SVM.
- Changes to previous version:
Initial Announcement on mloss.org.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- URL: Project Homepage
- Supported Operating Systems: Cygwin, Linux, Macosx
- Data Formats: None
- Tags: Clustering, Kernel Methods, Convex Optimization, Classifiaction, Density Estimation, Large Scale Learning, Kalman Filter, K Nearest Neighbor Classification, Algorithms, Classifiers, Nips2008, Kdtree
- Archive: download here
Other available revisons
Version Changelog Date 1.0.11
- Proper handling of dimension calculation in PCA.
- Load parameter vectors properly for LinearRegression models.
- Linker fixes for AugLagrangian specializations under Visual Studio.
- Add support for observation weights to LinearRegression.
- MahalanobisDistance<> now takes root of the distance by default and therefore satisfies the triangle inequality (TakeRoot now defaults to true).
- Better handling of optional Armadillo HDF5 dependency.
- Fixes for numerous intermittent test failures.
- math::RandomSeed() now sets the seed for recent (>= 3.930) Armadillo versions.
- Handle Newton method convergence better for SparseCoding::OptimizeDictionary() and make maximum iterations a parameter.
- Known bug: CosineTree construction may fail in some cases on i386 systems (376).
December 11, 2014, 18:20:35 1.0.10
- Bugfix for NeighborSearch regression which caused very slow allknn/allkfn. Speeds are nwo restored to approximately 1.0.8 speeds, with significant improvement for the cover tree (#365).
- Detect dependencies correctly when ARMA_USE_WRAPPER is not defined (i.e. libarmadillo.so does not exist).
- Bugfix for compilation under Visual Studio (#366).
August 29, 2014, 21:26:18 1.0.9
- GMM initialization is now safer and provides a working GMM when constructed with only the dimensionality and number of Gaussians (#314).
- Check for division by 0 in Forward-Backward Algorithm in HMMs (#314).
- Fix MaxVarianceNewCluster (used when re-initializing clusters for k-means) (#314).
- Fixed implementation of Viterbi algorithm in HMM::Predict() (#316).
- Significant speedups for dual-tree algorithms using the cover tree (#243, #329) including a faster implementation of FastMKS.
- Fix for LRSDP optimizer so that it compiles and can be used (#325).
- CF (collaborative filtering) now expects users and items to be zero-indexed, not one-indexed (#324).
- CF::GetRecommendations() API change: now requires the number of recommendations as the first parameter. The number of users in the local neighborhood should be specified with CF::NumUsersForSimilarity().
- Removed incorrect PeriodicHRectBound (#30).
- Refactor LRSDP into LRSDP class and standalone function to be optimized (#318).
- Fix for centering in kernel PCA (#355).
- Added simulated annealing (SA) optimizer, contributed by Zhihao Lou.
- HMMs now support initial state probabilities; these can be set in the constructor, trained, or set manually with HMM::Initial() (#315).
- Added Nyström method for kernel matrix approximation by Marcus Edel.
- Kernel PCA now supports using Nyström method for approximation.
- Ball trees now work with dual-tree algorithms, via the BallBound<> bound structure (#320); fixed by Yash Vadalia.
- The NMF class is now AMF<>, and supports far more types of factorizations, by Sumedh Ghaisas.
- A QUIC-SVD implementation has returned, written by Siddharth Agrawal and based on older code from Mudit Gupta.
- Added perceptron and decision stump by Udit Saxena (these are weak learners for an eventual AdaBoost class).
- Sparse autoencoder added by Siddharth Agrawal.
July 28, 2014, 20:52:10 1.0.8
- Memory leak in NeighborSearch index-mapping code fixed.
- GMMs can be trained using the existing model as a starting point by specifying an additional boolean parameter to GMM::Estimate().
- Logistic regression implementation added in methods/logistic_regression.
- Version information is now obtainable via mlpack::util::GetVersion() or the _MLPACKVERSION_MAJOR, _MLPACKVERSION_MINOR, and _MLPACKVERSION_PATCH macros.
- Fix typos in allkfn and allkrann output.
January 7, 2014, 05:47:22 1.0.7
- Cover tree support for range_search, rank-approximate nearest neighbors, minimum spanning tree calculation, and FastMKS.
- Dual-tree FastMKS implementation added and tests.
- Added collaborative filtering package that can provide recommendations when given users and items.
- Fix for correctness of Kernel PCA.
- Speedups for PCA and Kernel PCA.
- Fix for correctness of Neighborhood Components Analysis (NCA).
- Minor speedups for dual-tree algorithms.
- Fix for Naive Bayes Classifier (nbc).
- Added a ridge regression option to LinearRegression (linear_regression).
- Gaussian Mixture Models (gmm::GMM<>) now support arbitrary covariance matrix constraints.
- MVU removed because it is known to not work.
- Minor updates and fixes for kernels (in mlpack::kernel).
October 4, 2013, 22:24:48 1.0.6
Minor bugfix so that FastMKS gets built.
June 13, 2013, 21:26:10 1.0.5
Speedups of cover tree traversers; addition of rank-approximate nearest neighbor (RANN); addition of fast exact max-kernel search (FastMKS); fix for EM covariance estimation; more parameters for GMM estimation; force GMM and GaussianDistribution covariance matrices to be positive definite during training; add a tolerance parameter to the Baum-Welch algorithm for HMM training; fix for compilation with clang; fix for k-furthest neighbor search.
May 2, 2013, 07:24:32 1.0.4
Force minimum Armadillo version of 2.4.2; add locality-sensitive hashing (LSH); handle size_t support correctly with Armadillo 3.6.2; better tests for SGD and NCA; better output of types to streams; some style fixes.
February 8, 2013, 22:32:43 1.0.3
Armadillo 3.4.0 includes sparse matrix support internally; MLPACK's internal sparse matrix support has thus been removed.
September 17, 2012, 01:27:19 1.0.2
Added density estimation trees, nonnegative matrix factorization, an experimental cover tree implementation, and several bugfixes. See http://trac.research.cc.gatech.edu/fastlab/milestone/mlpack%201.0.2 for a full listing of tickets closed.
August 15, 2012, 20:47:13 1.0.1
Added local coordinate coding, sparse coding, kernel PCA, and several bugfixes.
March 20, 2012, 20:59:53 1.0.0
Yet another announcement on mloss.org.
December 17, 2011, 10:37:05 0.2
Initial Announcement on mloss.org.
November 20, 2009, 04:01:36 0.1
Initial Announcement on mloss.org.
October 7, 2008, 07:12:37
Leave a comment
You must be logged in to post comments.