-
- Description:
mlpack is a fast, flexible C++ machine learning library. Its aim is to make large-scale machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. mlpack also provides bindings to other languages.
The following methods are provided:
- Approximate furthest neighbor search techniques
- Collaborative Filtering (with NMF)
- Decision Stumps
- DBSCAN
- Density Estimation Trees
- Euclidean Minimum Spanning Trees
- Fast Exact Max-Kernel Search (FastMKS)
- Gaussian Mixture Models (GMMs)
- Hidden Markov Models (HMMs)
- Hoeffding trees (streaming decision trees)
- Kernel Principal Components Analysis (KPCA)
- K-Means Clustering
- Least-Angle Regression (LARS/LASSO)
- Local Coordinate Coding
- Locality-Sensitive Hashing (LSH)
- Logistic regression
- Naive Bayes Classifier
- Neighborhood Components Analysis (NCA)
- Neural Networks (FFNs, CNNs, RNNs)
- Nonnegative Matrix Factorization (NMF)
- Perceptron
- Principal Components Analysis (PCA)
- QUIC-SVD
- RADICAL (ICA)
- Regularized SVD
- Rank-Approximate Nearest Neighbor (RANN)
- Simple Least-Squares Linear Regression (and Ridge Regression)
- Sparse Autoencoder
- Sparse Coding
- Tree-based Neighbor Search (all-k-nearest-neighbors, all-k-furthest-neighbors), using either kd-trees or cover trees
- Tree-based Range Search
- and also more not listed here
Command-line executables are provided for each of these, and the C++ classes which define the methods are highly flexible, extensible, and modular. More information (including documentation, tutorials, and bug reports) is available at http://www.mlpack.org/.
- Changes to previous version:
Released June 8th, 2018.
- Documentation generation fixes for Python bindings (#1421).
- Fix build error for man pages if command-line bindings are not being built (#1424).
- Add shuffle parameter and Shuffle() method to KFoldCV (#1412). This will shuffle the data when the object is constructed, or when Shuffle() is called.
- Added neural network layers: AtrousConvolution (#1390), Embedding (#1401), and LayerNorm (layer normalization) (#1389).
- Add Pendulum environment for reinforcement learning (#1388) and update Mountain Car environment (#1394).
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Plain Ascii, Ascii, Txt, Hdf, Bin, Csv, Xml
- Tags: Gmm, Hmm, Machine Learning, Sparse, Dual Tree, Fast, Scalable, Tree
- Archive: download here
Other available revisons
-
Version Changelog Date 3.0.2 Released June 8th, 2018.
- Documentation generation fixes for Python bindings (#1421).
- Fix build error for man pages if command-line bindings are not being built (#1424).
- Add shuffle parameter and Shuffle() method to KFoldCV (#1412). This will shuffle the data when the object is constructed, or when Shuffle() is called.
- Added neural network layers: AtrousConvolution (#1390), Embedding (#1401), and LayerNorm (layer normalization) (#1389).
- Add Pendulum environment for reinforcement learning (#1388) and update Mountain Car environment (#1394).
June 9, 2018, 18:03:57 3.0.1 Released May 10th, 2018.
- Fix intermittently failing tests (#1387).
- Add Big-Batch SGD (BBSGD) optimizer in src/mlpack/core/optimizers/bigbatch_sgd (#1131).
- Fix simple compiler warnings (#1380, #1373).
- Simplify NeighborSearch constructor and Train() overloads (#1378).
- Add warning for OpenMP setting differences (#1358/#1382). When mlpack is compiled with OpenMP but another application linking against mlpack is not (or vice versa), a compilation warning will now be issued.
- Restructured loss functions in src/mlpack/methods/ann/ (#1365).
- Add environments for reinforcement learning tests (#1368, #1370, #1329).
- Allow single outputs for multiple timestep inputs for recurrent neural networks (#1348).
- Neural networks: add He and LeCun normal initializations (#1342), add FReLU and SELU activation functions (#1346, #1341), add alpha-dropout (#1349).
May 11, 2018, 05:28:53 3.0.0 Released March 30th, 2018.
- Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
- Bump minimum required version of Armadillo to 6.500.0.
- Add automatically generated Python bindings. These have the same interface as the command-line programs.
- Add deep learning infrastructure in src/mlpack/methods/ann/.
- Add reinforcement learning infrastructure in src/mlpack/methods/reinforcement_learning/.
- Add optimizers: AdaGrad, CMAES, CNE, FrankeWolfe, GradientDescent, GridSearch, IQN, Katyusha, LineSearch, ParallelSGD, SARAH, SCD, SGDR, SMORMS3, SPALeRA, SVRG.
- Add hyperparameter tuning infrastructure and cross-validation infrastructure in src/mlpack/core/cv/ and src/mlpack/core/hpt/.
- Fix bug in mean shift.
- Add random forests (see src/mlpack/methods/random_forest).
- Numerous other bugfixes and testing improvements.
- Add randomized Krylov SVD and Block Krylov SVD.
March 31, 2018, 05:31:08 2.2.5 Released August 25, 2017.
Compilation fix for some systems (#1082).
Fix PARAM_INT_OUT() (#1100).
August 26, 2017, 06:07:47 2.2.4 Released July 18, 2017.
Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
Fix bug in CF causing incorrect recommendations.
July 19, 2017, 04:24:19 2.2.3 Released May 24, 2017.
- Bug fix for --predictions_file in mlpack_decision_tree program.
May 24, 2017, 17:31:17 2.2.2 Released May 4, 2017.
Install backwards-compatibility mlpack_allknn and mlpack_allkfn programs; note they are deprecated and will be removed in mlpack 3.0.0 (#992).
Fix RStarTree bug that surfaced on OS X only (#964).
Small fixes for MiniBatchSGD and SGD and tests.
May 5, 2017, 03:01:20 2.2.1 Released Apr. 13th, 2016.
- Compilation fix for mlpack_nca and mlpack_test on older Armadillo versions (#984).
April 13, 2017, 22:25:04 2.2.0 Released Mar. 21st, 2016.
- Bugfix for mlpack_knn program (#816).
- Add decision tree implementation in methods/decision_tree/. This is very similar to a C4.5 tree learner.
- Add DBSCAN implementation in methods/dbscan/.
- Add support for multidimensional discrete distributions (#810, #830).
- Better output for Log::Debug/Log::Info/Log::Warn/Log::Fatal for Armadillo objects (#895, #928).
- Refactor categorical CSV loading with boost::spirit for faster loading (#681).
April 13, 2017, 21:32:52 2.1.1 Released Dec. 22nd, 2016.
- HMMs now use random initialization; this should fix some convergence issues (#828).
- HMMs now initialize emissions according to the distribution of observations (#833).
- Minor fix for formatted output (#814).
- Fix DecisionStump to properly work with any input type.
December 22, 2016, 20:01:29 2.1.0 Fixed CoverTree to properly handle single-point datasets. - Fixed a bug in CosineTree (and thus QUIC-SVD) that caused split failures for some datasets (#717). - Added mlpack_preprocess_describe program, which can be used to print statistics on a given dataset (#742). - Fix prioritized recursion for k-furthest-neighbor search (mlpack_kfn and the KFN class), leading to orders-of-magnitude speedups in some cases. - Bump minimum required version of Armadillo to 4.200.0. - Added simple Gradient Descent optimizer, found in src/mlpack/core/optimizers/gradient_descent/ (#792). - Added approximate furthest neighbor search algorithms QDAFN and DrusillaSelect in src/mlpack/methods/approx_kfn/, with command-line program mlpack_approx_kfn.
November 1, 2016, 16:01:16 2.0.3 - Standardize some parameter names for programs (old names are kept for reverse compatibility, but warnings will now be issued).
- RectangleTree optimizations (#721).
- Fix memory leak in NeighborSearch (#731).
- Documentation fix for k-means tutorial (#730).
- Fix TreeTraits for BallTree (#727).
- Fix incorrect parameter checks for some command-line programs.
- Fix error in HMM training with probabilities for each point (#636).
July 22, 2016, 00:39:12 2.0.2 - Added the function LSHSearch::Projections(), which returns an arma::cube with each projection table in a slice (#663). Instead of Projection(i), you should now use Projections().slice(i).
- A new constructor has been added to LSHSearch that creates objects using projection tables provided in an arma::cube (#663).
- LSHSearch projection tables refactored for speed (#675).
- Handle zero-variance dimensions in DET (#515).
- Add MiniBatchSGD optimizer (src/mlpack/core/optimizers/minibatch_sgd/) and allow its use in mlpack_logistic_regression and mlpack_nca programs.
- Add better backtrace support from Grzegorz Krajewski for Log::Fatal messages when compiled with debugging and profiling symbols. This requires libbfd and libdl to be present during compilation.
- CosineTree test fix from Mikhail Lozhnikov (#358).
- Fixed HMM initial state estimation (#600).
- Changed versioning macros _MLPACKVERSION_MAJOR, _MLPACKVERSION_MINOR, and _MLPACKVERSION_PATCH to MLPACK_VERSION_MAJOR, MLPACK_VERSION_MINOR, and MLPACK_VERSION_PATCH. The old names will remain in place until mlpack 3.0.0.
- Renamed mlpack_allknn, mlpack_allkfn, and mlpack_allkrann to mlpack_knn, mlpack_kfn, and mlpack_krann. The mlpack_allknn, mlpack_allkfn, and mlpack_allkrann programs will remain as copies until mlpack 3.0.0.
- Add --random_initialization option to mlpack_hmm_train, for use when no labels are provided.
- Add --kill_empty_clusters option to mlpack_kmeans and KillEmptyClusters policy for the KMeans class (#595, #596).
June 20, 2016, 22:23:45 2.0.1 - Fix CMake to properly detect when MKL is being used with Armadillo.
- Minor parameter handling fixes to mlpack_logistic_regression.
- Properly install arma_config.hpp.
- Memory handling fixes for Hoeffding tree code.
- Add functions that allow changing training-time parameters to HoeffdingTree class.
- Fix infinite loop in sparse coding test.
- Documentation spelling fixes.
- Properly handle covariances for Gaussians with large condition number, preventing GMMs from filling with NaNs during training (and also HMMs that use GMMs).
- CMake fixes for finding LAPACK and BLAS as Armadillo dependencies when ATLAS is used.
- CMake fix for projects using mlpack's CMake configuration from elsewhere.
March 3, 2016, 18:52:03 2.0.0 - Removed overclustering support from k-means because it is not well-tested, may be buggy, and is (I think) unused. If this was support you were using, open a bug or get in touch with us; it would not be hard for us to reimplement it.
- Refactored KMeans to allow different types of Lloyd iterations.
- Added implementations of k-means: Elkan's algorithm, Hamerly's algorithm, Pelleg-Moore's algorithm, and the DTNN (dual-tree nearest neighbor) algorithm.
- Significant acceleration of LRSDP via the use of accu(a % b) instead of trace(a * b).
- Added MatrixCompletion class (matrix_completion), which performs nuclear norm minimization to fill unknown values of an input matrix.
- No more dependence on Boost.Random; now we use C++11 STL random support.
- Add softmax regression, contributed by Siddharth Agrawal and QiaoAn Chen.
- Changed NeighborSearch, RangeSearch, FastMKS, LSH, and RASearch API; these classes now take the query sets in the Search() method, instead of in the constructor.
- Use OpenMP, if available. For now OpenMP support is only available in the DET training code.
- Add support for predicting new test point values to LARS and the command-line 'lars' program.
- Add serialization support for Perceptron and LogisticRegression.
- Refactor SoftmaxRegression to predict into an arma::Row object, and add a softmax_regression program.
- Refactor LSH to allow loading and saving of models.
- ToString() is removed entirely (#487).
- Add --input_model_file and --output_model_file options to appropriate machine learning algorithms.
- Rename all executables to start with an "mlpack" prefix (#229).
See also https://mailman.cc.gatech.edu/pipermail/mlpack/2015-December/000706.html for more information.
January 11, 2016, 17:24:35 1.0.12 - Switch to 3-clause BSD license.
January 7, 2015, 19:23:51 1.0.11 - Proper handling of dimension calculation in PCA.
- Load parameter vectors properly for LinearRegression models.
- Linker fixes for AugLagrangian specializations under Visual Studio.
- Add support for observation weights to LinearRegression.
- MahalanobisDistance<> now takes root of the distance by default and therefore satisfies the triangle inequality (TakeRoot now defaults to true).
- Better handling of optional Armadillo HDF5 dependency.
- Fixes for numerous intermittent test failures.
- math::RandomSeed() now sets the seed for recent (>= 3.930) Armadillo versions.
- Handle Newton method convergence better for SparseCoding::OptimizeDictionary() and make maximum iterations a parameter.
- Known bug: CosineTree construction may fail in some cases on i386 systems (376).
December 11, 2014, 18:20:35 1.0.10 - Bugfix for NeighborSearch regression which caused very slow allknn/allkfn. Speeds are nwo restored to approximately 1.0.8 speeds, with significant improvement for the cover tree (#365).
- Detect dependencies correctly when ARMA_USE_WRAPPER is not defined (i.e. libarmadillo.so does not exist).
- Bugfix for compilation under Visual Studio (#366).
August 29, 2014, 21:26:18 1.0.9 - GMM initialization is now safer and provides a working GMM when constructed with only the dimensionality and number of Gaussians (#314).
- Check for division by 0 in Forward-Backward Algorithm in HMMs (#314).
- Fix MaxVarianceNewCluster (used when re-initializing clusters for k-means) (#314).
- Fixed implementation of Viterbi algorithm in HMM::Predict() (#316).
- Significant speedups for dual-tree algorithms using the cover tree (#243, #329) including a faster implementation of FastMKS.
- Fix for LRSDP optimizer so that it compiles and can be used (#325).
- CF (collaborative filtering) now expects users and items to be zero-indexed, not one-indexed (#324).
- CF::GetRecommendations() API change: now requires the number of recommendations as the first parameter. The number of users in the local neighborhood should be specified with CF::NumUsersForSimilarity().
- Removed incorrect PeriodicHRectBound (#30).
- Refactor LRSDP into LRSDP class and standalone function to be optimized (#318).
- Fix for centering in kernel PCA (#355).
- Added simulated annealing (SA) optimizer, contributed by Zhihao Lou.
- HMMs now support initial state probabilities; these can be set in the constructor, trained, or set manually with HMM::Initial() (#315).
- Added Nyström method for kernel matrix approximation by Marcus Edel.
- Kernel PCA now supports using Nyström method for approximation.
- Ball trees now work with dual-tree algorithms, via the BallBound<> bound structure (#320); fixed by Yash Vadalia.
- The NMF class is now AMF<>, and supports far more types of factorizations, by Sumedh Ghaisas.
- A QUIC-SVD implementation has returned, written by Siddharth Agrawal and based on older code from Mudit Gupta.
- Added perceptron and decision stump by Udit Saxena (these are weak learners for an eventual AdaBoost class).
- Sparse autoencoder added by Siddharth Agrawal.
July 28, 2014, 20:52:10 1.0.8 - Memory leak in NeighborSearch index-mapping code fixed.
- GMMs can be trained using the existing model as a starting point by specifying an additional boolean parameter to GMM::Estimate().
- Logistic regression implementation added in methods/logistic_regression.
- Version information is now obtainable via mlpack::util::GetVersion() or the _MLPACKVERSION_MAJOR, _MLPACKVERSION_MINOR, and _MLPACKVERSION_PATCH macros.
- Fix typos in allkfn and allkrann output.
January 7, 2014, 05:47:22 1.0.7 - Cover tree support for range_search, rank-approximate nearest neighbors, minimum spanning tree calculation, and FastMKS.
- Dual-tree FastMKS implementation added and tests.
- Added collaborative filtering package that can provide recommendations when given users and items.
- Fix for correctness of Kernel PCA.
- Speedups for PCA and Kernel PCA.
- Fix for correctness of Neighborhood Components Analysis (NCA).
- Minor speedups for dual-tree algorithms.
- Fix for Naive Bayes Classifier (nbc).
- Added a ridge regression option to LinearRegression (linear_regression).
- Gaussian Mixture Models (gmm::GMM<>) now support arbitrary covariance matrix constraints.
- MVU removed because it is known to not work.
- Minor updates and fixes for kernels (in mlpack::kernel).
October 4, 2013, 22:24:48 1.0.6 Minor bugfix so that FastMKS gets built.
June 13, 2013, 21:26:10 1.0.5 Speedups of cover tree traversers; addition of rank-approximate nearest neighbor (RANN); addition of fast exact max-kernel search (FastMKS); fix for EM covariance estimation; more parameters for GMM estimation; force GMM and GaussianDistribution covariance matrices to be positive definite during training; add a tolerance parameter to the Baum-Welch algorithm for HMM training; fix for compilation with clang; fix for k-furthest neighbor search.
May 2, 2013, 07:24:32 1.0.4 Force minimum Armadillo version of 2.4.2; add locality-sensitive hashing (LSH); handle size_t support correctly with Armadillo 3.6.2; better tests for SGD and NCA; better output of types to streams; some style fixes.
February 8, 2013, 22:32:43 1.0.3 Armadillo 3.4.0 includes sparse matrix support internally; MLPACK's internal sparse matrix support has thus been removed.
September 17, 2012, 01:27:19 1.0.2 Added density estimation trees, nonnegative matrix factorization, an experimental cover tree implementation, and several bugfixes. See http://trac.research.cc.gatech.edu/fastlab/milestone/mlpack%201.0.2 for a full listing of tickets closed.
August 15, 2012, 20:47:13 1.0.1 Added local coordinate coding, sparse coding, kernel PCA, and several bugfixes.
March 20, 2012, 20:59:53 1.0.0 Yet another announcement on mloss.org.
December 17, 2011, 10:37:05 0.2 Initial Announcement on mloss.org.
November 20, 2009, 04:01:36 0.1 Initial Announcement on mloss.org.
October 7, 2008, 07:12:37
Comments
-
- Eileen (on February 13, 2009, 12:13:23)
- having this problem when running fl-build-all /bin/sh: g++4: not found make: *** [$FASTLIBPATH/bin/i686_Linux_fast_gcc4_-DDISABLE_DISK_MATRIX/obj/mlpack_allnn_main.o] Error 127 and a whole lot of similar error Am i missing something?
-
- fastlab (on February 14, 2009, 03:55:05)
- You need to install gcc 4. Which platform are you running on?
-
- Paul Rodriguez (on December 21, 2010, 21:38:24)
- Hi, I've set up the ccmake configuration options as appropriate but now I'm having trouble with the make command described below, thanks, Paul Rodriguez Using a santos linux, on an intel 64 bit processor, when I execute "make install" I get the following error regarding pthread_atfork: -- A library with BLAS API found. -- A library with BLAS API found. -- A library with LAPACK API found. -- Configuring done -- Generating done -- Build files have been written to: /users/sdsc/prodriguez/mlpack-0.2/fastlib/build [ 2%] Built target template_types [ 5%] Built target template_types_detect [ 17%] Built target base [ 20%] Built target col [ 23%] Built target file [ 30%] Built target fx [ 33%] Built target la [ 35%] Built target data [ 35%] Built target tree [ 43%] Built target math [ 46%] Built target par [ 87%] Built target fastlib [ 89%] Built target otrav_test [ 92%] Built target col_test [ 94%] Building CXX object fastlib/data/CMakeFiles/dataset_test.dir/dataset_test.cc.o Linking CXX executable dataset_test /rmount/usr_apps/compilers/intel/Compiler/11.1/038/lib/intel64/libguide.so: undefined reference to `pthread_atfork' collect2: ld returned 1 exit status make[2]: *** [fastlib/data/dataset_test] Error 1 make[1]: *** [fastlib/data/CMakeFiles/dataset_test.dir/all] Error 2 make: *** [all] Error 2
-
- Andreas Mueller (on March 20, 2012, 13:29:07)
- Two comments: 1) I have not found a way to contact the project on the project website. Having to come to mloss and logging in to contact the developers seems a bit weird. 2) mlpack does not seems to build with armadilla in a non-standard location. After trying to feed cmake the correct pathes for a while I gave up and installed globally. In particular, setting the paths in the CMake configuration doesn't help much. Would be cool if you could fix that. Cheers, Andy
-
- Ryan Curtin (on March 20, 2012, 20:22:49)
- Hello Andy, I've clarified www.mlpack.org a bit to note that the Trac site is where bugs can be filed. As for finding Armadillo, I have not had a problem doing the following (in this instance, I've got Armadillo 2.99.1 built in /home/ryan/src/armadillo-2.99.1/) `build$ cmake -D ARMADILLO_INCLUDE_DIR=/home/ryan/src/armadillo-2.99.1/build/ -D ARMADILLO_LIBRARY=/home/ryan/src/armadillo-2.99.1/libarmadillo.so ../` Did those two variables (ARMADILLO_INCLUDE_DIR and ARMADILLO_LIBRARY) not work for you? If you're still having problems (or have other problems) feel free to file a ticket at http://trac.research.cc.gatech.edu/fastlab/
Leave a comment
You must be logged in to post comments.