
 Description:
MLPACK is the first comprehensive scalable machine learning library. Developed by the Fundamental Algorithmic and Statistical Tools laboratory (FASTlab), MLPACK and its core functions library FASTlib are the much needed filling of an existing void. Previously, researchers had to either (a) settle for poorlyscaling collections of methods implemented for academic purposes, (b) hunt down the often difficult to find and difficult to apply yet fast code writen by algorithms' developers, or (c) reimplement solutions to their specific analysis problems from scratch. With MLPACK, we offer a fourth option, in which researchers may find all the methods they need designed favoring both speed and usability.
MLPACK currently includes a wide range of the following efficient algorithms:
$k$nearest neighbor classifier.
FastICA.
Hidden Markov Models.
Information Maximization algorithm for ICA.
Kalman filter.
Kernel density estimation algorithm using series expansion.
Mixture of Gaussians using maximum likelihood and L2 error.
Naive Bayes classifier.
NelderMead/QuasiNewton optimizer.
Series expansion library for Gaussian kernel in $O(p^D)$ and $O(D^p)$ expansions.
Support Vector Machine classifier and regression.
Sequential Minimal Optimization algorithm for SVM.
 Changes to previous version:
Initial Announcement on mloss.org.
 BibTeX Entry: Download
 Corresponding Paper BibTeX Entry: Download
 URL: Project Homepage
 Supported Operating Systems: Cygwin, Linux, Macosx
 Data Formats: None
 Tags: Clustering, Kernel Methods, Convex Optimization, Classifiaction, Density Estimation, Large Scale Learning, Kalman Filter, K Nearest Neighbor Classification, Algorithms, Classifiers, Nips2008, Kdtree
 Archive: download here
Other available revisons

Version Changelog Date 2.1.1 Released Dec. 22nd, 2016.
 HMMs now use random initialization; this should fix some convergence issues (#828).
 HMMs now initialize emissions according to the distribution of observations (#833).
 Minor fix for formatted output (#814).
 Fix DecisionStump to properly work with any input type.
December 22, 2016, 20:01:29 2.1.0 Fixed CoverTree to properly handle singlepoint datasets.  Fixed a bug in CosineTree (and thus QUICSVD) that caused split failures for some datasets (#717).  Added mlpack_preprocess_describe program, which can be used to print statistics on a given dataset (#742).  Fix prioritized recursion for kfurthestneighbor search (mlpack_kfn and the KFN class), leading to ordersofmagnitude speedups in some cases.  Bump minimum required version of Armadillo to 4.200.0.  Added simple Gradient Descent optimizer, found in src/mlpack/core/optimizers/gradient_descent/ (#792).  Added approximate furthest neighbor search algorithms QDAFN and DrusillaSelect in src/mlpack/methods/approx_kfn/, with commandline program mlpack_approx_kfn.
November 1, 2016, 16:01:16 2.0.3  Standardize some parameter names for programs (old names are kept for reverse compatibility, but warnings will now be issued).
 RectangleTree optimizations (#721).
 Fix memory leak in NeighborSearch (#731).
 Documentation fix for kmeans tutorial (#730).
 Fix TreeTraits for BallTree (#727).
 Fix incorrect parameter checks for some commandline programs.
 Fix error in HMM training with probabilities for each point (#636).
July 22, 2016, 00:39:12 2.0.2  Added the function LSHSearch::Projections(), which returns an arma::cube with each projection table in a slice (#663). Instead of Projection(i), you should now use Projections().slice(i).
 A new constructor has been added to LSHSearch that creates objects using projection tables provided in an arma::cube (#663).
 LSHSearch projection tables refactored for speed (#675).
 Handle zerovariance dimensions in DET (#515).
 Add MiniBatchSGD optimizer (src/mlpack/core/optimizers/minibatch_sgd/) and allow its use in mlpack_logistic_regression and mlpack_nca programs.
 Add better backtrace support from Grzegorz Krajewski for Log::Fatal messages when compiled with debugging and profiling symbols. This requires libbfd and libdl to be present during compilation.
 CosineTree test fix from Mikhail Lozhnikov (#358).
 Fixed HMM initial state estimation (#600).
 Changed versioning macros _MLPACKVERSION_MAJOR, _MLPACKVERSION_MINOR, and _MLPACKVERSION_PATCH to MLPACK_VERSION_MAJOR, MLPACK_VERSION_MINOR, and MLPACK_VERSION_PATCH. The old names will remain in place until mlpack 3.0.0.
 Renamed mlpack_allknn, mlpack_allkfn, and mlpack_allkrann to mlpack_knn, mlpack_kfn, and mlpack_krann. The mlpack_allknn, mlpack_allkfn, and mlpack_allkrann programs will remain as copies until mlpack 3.0.0.
 Add random_initialization option to mlpack_hmm_train, for use when no labels are provided.
 Add kill_empty_clusters option to mlpack_kmeans and KillEmptyClusters policy for the KMeans class (#595, #596).
June 20, 2016, 22:23:45 2.0.1  Fix CMake to properly detect when MKL is being used with Armadillo.
 Minor parameter handling fixes to mlpack_logistic_regression.
 Properly install arma_config.hpp.
 Memory handling fixes for Hoeffding tree code.
 Add functions that allow changing trainingtime parameters to HoeffdingTree class.
 Fix infinite loop in sparse coding test.
 Documentation spelling fixes.
 Properly handle covariances for Gaussians with large condition number, preventing GMMs from filling with NaNs during training (and also HMMs that use GMMs).
 CMake fixes for finding LAPACK and BLAS as Armadillo dependencies when ATLAS is used.
 CMake fix for projects using mlpack's CMake configuration from elsewhere.
March 3, 2016, 18:52:03 2.0.0  Removed overclustering support from kmeans because it is not welltested, may be buggy, and is (I think) unused. If this was support you were using, open a bug or get in touch with us; it would not be hard for us to reimplement it.
 Refactored KMeans to allow different types of Lloyd iterations.
 Added implementations of kmeans: Elkan's algorithm, Hamerly's algorithm, PellegMoore's algorithm, and the DTNN (dualtree nearest neighbor) algorithm.
 Significant acceleration of LRSDP via the use of accu(a % b) instead of trace(a * b).
 Added MatrixCompletion class (matrix_completion), which performs nuclear norm minimization to fill unknown values of an input matrix.
 No more dependence on Boost.Random; now we use C++11 STL random support.
 Add softmax regression, contributed by Siddharth Agrawal and QiaoAn Chen.
 Changed NeighborSearch, RangeSearch, FastMKS, LSH, and RASearch API; these classes now take the query sets in the Search() method, instead of in the constructor.
 Use OpenMP, if available. For now OpenMP support is only available in the DET training code.
 Add support for predicting new test point values to LARS and the commandline 'lars' program.
 Add serialization support for Perceptron and LogisticRegression.
 Refactor SoftmaxRegression to predict into an arma::Row object, and add a softmax_regression program.
 Refactor LSH to allow loading and saving of models.
 ToString() is removed entirely (#487).
 Add input_model_file and output_model_file options to appropriate machine learning algorithms.
 Rename all executables to start with an "mlpack" prefix (#229).
See also https://mailman.cc.gatech.edu/pipermail/mlpack/2015December/000706.html for more information.
January 11, 2016, 17:24:35 1.0.12  Switch to 3clause BSD license.
January 7, 2015, 19:23:51 1.0.11  Proper handling of dimension calculation in PCA.
 Load parameter vectors properly for LinearRegression models.
 Linker fixes for AugLagrangian specializations under Visual Studio.
 Add support for observation weights to LinearRegression.
 MahalanobisDistance<> now takes root of the distance by default and therefore satisfies the triangle inequality (TakeRoot now defaults to true).
 Better handling of optional Armadillo HDF5 dependency.
 Fixes for numerous intermittent test failures.
 math::RandomSeed() now sets the seed for recent (>= 3.930) Armadillo versions.
 Handle Newton method convergence better for SparseCoding::OptimizeDictionary() and make maximum iterations a parameter.
 Known bug: CosineTree construction may fail in some cases on i386 systems (376).
December 11, 2014, 18:20:35 1.0.10  Bugfix for NeighborSearch regression which caused very slow allknn/allkfn. Speeds are nwo restored to approximately 1.0.8 speeds, with significant improvement for the cover tree (#365).
 Detect dependencies correctly when ARMA_USE_WRAPPER is not defined (i.e. libarmadillo.so does not exist).
 Bugfix for compilation under Visual Studio (#366).
August 29, 2014, 21:26:18 1.0.9  GMM initialization is now safer and provides a working GMM when constructed with only the dimensionality and number of Gaussians (#314).
 Check for division by 0 in ForwardBackward Algorithm in HMMs (#314).
 Fix MaxVarianceNewCluster (used when reinitializing clusters for kmeans) (#314).
 Fixed implementation of Viterbi algorithm in HMM::Predict() (#316).
 Significant speedups for dualtree algorithms using the cover tree (#243, #329) including a faster implementation of FastMKS.
 Fix for LRSDP optimizer so that it compiles and can be used (#325).
 CF (collaborative filtering) now expects users and items to be zeroindexed, not oneindexed (#324).
 CF::GetRecommendations() API change: now requires the number of recommendations as the first parameter. The number of users in the local neighborhood should be specified with CF::NumUsersForSimilarity().
 Removed incorrect PeriodicHRectBound (#30).
 Refactor LRSDP into LRSDP class and standalone function to be optimized (#318).
 Fix for centering in kernel PCA (#355).
 Added simulated annealing (SA) optimizer, contributed by Zhihao Lou.
 HMMs now support initial state probabilities; these can be set in the constructor, trained, or set manually with HMM::Initial() (#315).
 Added Nyström method for kernel matrix approximation by Marcus Edel.
 Kernel PCA now supports using Nyström method for approximation.
 Ball trees now work with dualtree algorithms, via the BallBound<> bound structure (#320); fixed by Yash Vadalia.
 The NMF class is now AMF<>, and supports far more types of factorizations, by Sumedh Ghaisas.
 A QUICSVD implementation has returned, written by Siddharth Agrawal and based on older code from Mudit Gupta.
 Added perceptron and decision stump by Udit Saxena (these are weak learners for an eventual AdaBoost class).
 Sparse autoencoder added by Siddharth Agrawal.
July 28, 2014, 20:52:10 1.0.8  Memory leak in NeighborSearch indexmapping code fixed.
 GMMs can be trained using the existing model as a starting point by specifying an additional boolean parameter to GMM::Estimate().
 Logistic regression implementation added in methods/logistic_regression.
 Version information is now obtainable via mlpack::util::GetVersion() or the _MLPACKVERSION_MAJOR, _MLPACKVERSION_MINOR, and _MLPACKVERSION_PATCH macros.
 Fix typos in allkfn and allkrann output.
January 7, 2014, 05:47:22 1.0.7  Cover tree support for range_search, rankapproximate nearest neighbors, minimum spanning tree calculation, and FastMKS.
 Dualtree FastMKS implementation added and tests.
 Added collaborative filtering package that can provide recommendations when given users and items.
 Fix for correctness of Kernel PCA.
 Speedups for PCA and Kernel PCA.
 Fix for correctness of Neighborhood Components Analysis (NCA).
 Minor speedups for dualtree algorithms.
 Fix for Naive Bayes Classifier (nbc).
 Added a ridge regression option to LinearRegression (linear_regression).
 Gaussian Mixture Models (gmm::GMM<>) now support arbitrary covariance matrix constraints.
 MVU removed because it is known to not work.
 Minor updates and fixes for kernels (in mlpack::kernel).
October 4, 2013, 22:24:48 1.0.6 Minor bugfix so that FastMKS gets built.
June 13, 2013, 21:26:10 1.0.5 Speedups of cover tree traversers; addition of rankapproximate nearest neighbor (RANN); addition of fast exact maxkernel search (FastMKS); fix for EM covariance estimation; more parameters for GMM estimation; force GMM and GaussianDistribution covariance matrices to be positive definite during training; add a tolerance parameter to the BaumWelch algorithm for HMM training; fix for compilation with clang; fix for kfurthest neighbor search.
May 2, 2013, 07:24:32 1.0.4 Force minimum Armadillo version of 2.4.2; add localitysensitive hashing (LSH); handle size_t support correctly with Armadillo 3.6.2; better tests for SGD and NCA; better output of types to streams; some style fixes.
February 8, 2013, 22:32:43 1.0.3 Armadillo 3.4.0 includes sparse matrix support internally; MLPACK's internal sparse matrix support has thus been removed.
September 17, 2012, 01:27:19 1.0.2 Added density estimation trees, nonnegative matrix factorization, an experimental cover tree implementation, and several bugfixes. See http://trac.research.cc.gatech.edu/fastlab/milestone/mlpack%201.0.2 for a full listing of tickets closed.
August 15, 2012, 20:47:13 1.0.1 Added local coordinate coding, sparse coding, kernel PCA, and several bugfixes.
March 20, 2012, 20:59:53 1.0.0 Yet another announcement on mloss.org.
December 17, 2011, 10:37:05 0.2 Initial Announcement on mloss.org.
November 20, 2009, 04:01:36 0.1 Initial Announcement on mloss.org.
October 7, 2008, 07:12:37
Comments

 fastlab (on February 14, 2009, 03:55:05)
You need to install gcc 4. Which platform are you running on?

 Paul Rodriguez (on December 21, 2010, 21:38:24)
Hi,
I've set up the ccmake configuration options as appropriate but now I'm having trouble with the make command described below,
thanks, Paul Rodriguez
Using a santos linux, on an intel 64 bit processor, when I execute "make install" I get the following error regarding pthread_atfork:
 A library with BLAS API found.  A library with BLAS API found.  A library with LAPACK API found.  Configuring done  Generating done  Build files have been written to: /users/sdsc/prodriguez/mlpack0.2/fastlib/build [ 2%] Built target template_types [ 5%] Built target template_types_detect [ 17%] Built target base [ 20%] Built target col [ 23%] Built target file [ 30%] Built target fx [ 33%] Built target la [ 35%] Built target data [ 35%] Built target tree [ 43%] Built target math [ 46%] Built target par [ 87%] Built target fastlib [ 89%] Built target otrav_test [ 92%] Built target col_test [ 94%] Building CXX object fastlib/data/CMakeFiles/dataset_test.dir/dataset_test.cc.o Linking CXX executable dataset_test /rmount/usr_apps/compilers/intel/Compiler/11.1/038/lib/intel64/libguide.so: undefined reference to `pthread_atfork' collect2: ld returned 1 exit status make[2]: [fastlib/data/dataset_test] Error 1 make[1]: [fastlib/data/CMakeFiles/dataset_test.dir/all] Error 2 make: * [all] Error 2

 Andreas Mueller (on March 20, 2012, 13:29:07)
Two comments: 1) I have not found a way to contact the project on the project website. Having to come to mloss and logging in to contact the developers seems a bit weird.
2) mlpack does not seems to build with armadilla in a nonstandard location. After trying to feed cmake the correct pathes for a while I gave up and installed globally. In particular, setting the paths in the CMake configuration doesn't help much. Would be cool if you could fix that.
Cheers, Andy

 Ryan Curtin (on March 20, 2012, 20:22:49)
Hello Andy,
I've clarified www.mlpack.org a bit to note that the Trac site is where bugs can be filed.
As for finding Armadillo, I have not had a problem doing the following (in this instance, I've got Armadillo 2.99.1 built in /home/ryan/src/armadillo2.99.1/)
build$ cmake D ARMADILLO_INCLUDE_DIR=/home/ryan/src/armadillo2.99.1/build/ D ARMADILLO_LIBRARY=/home/ryan/src/armadillo2.99.1/libarmadillo.so ../
Did those two variables (ARMADILLO_INCLUDE_DIR and ARMADILLO_LIBRARY) not work for you? If you're still having problems (or have other problems) feel free to file a ticket at
http://trac.research.cc.gatech.edu/fastlab/
Leave a comment
You must be logged in to post comments.
having this problem when running flbuildall
/bin/sh: g++4: not found make: * [$FASTLIBPATH/bin/i686_Linux_fast_gcc4_DDISABLE_DISK_MATRIX/obj/mlpack_allnn_main.o] Error 127
and a whole lot of similar error
Am i missing something?