Projects supporting the svmlight data format.

Logo JMLR dlib ml 18.18

by davis685 - October 29, 2015, 01:48:44 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 120531 views, 20042 downloads, 4 subscriptions

About: This project is a C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems.


This release has focused on build system improvements, both for the Python API and C++ builds using CMake. This includes adding a script for installing the dlib Python API as well as a make install target for installing a C++ shared library for non-Python use.

Logo Harry 0.4.0

by konrad - March 30, 2015, 14:03:12 CET [ Project Homepage BibTeX Download ] 5707 views, 1240 downloads, 2 subscriptions

About: A Tool for Measuring String Similarity


The new release supports measuring string similarity at the granularity of bytes, bits and tokens. A Python interface has been added. Several minor bugs have been fixed.

Logo JMLR Sally 1.0.0

by konrad - March 26, 2015, 17:01:35 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 32099 views, 6233 downloads, 3 subscriptions

About: A Tool for Embedding Strings in Vector Spaces


Support for explicit selection of granularity added. Several minor bug fixes. We have reached 1.0

Logo JMLR SHOGUN 4.0.0

by sonne - February 5, 2015, 09:09:37 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 101407 views, 14421 downloads, 6 subscriptions

Rating Whole StarWhole StarWhole StarEmpty StarEmpty Star
(based on 6 votes)

About: The SHOGUN machine learning toolbox's focus is on large scale learning methods with focus on Support Vector Machines (SVM), providing interfaces to python, octave, matlab, r and the command line.


This release features the work of our 8 GSoC 2014 students [student; mentors]:

  • OpenCV Integration and Computer Vision Applications [Abhijeet Kislay; Kevin Hughes]
  • Large-Scale Multi-Label Classification [Abinash Panda; Thoralf Klein]
  • Large-scale structured prediction with approximate inference [Jiaolong Xu; Shell Hu]
  • Essential Deep Learning Modules [Khaled Nasr; Sergey Lisitsyn, Theofanis Karaletsos]
  • Fundamental Machine Learning: decision trees, kernel density estimation [Parijat Mazumdar ; Fernando Iglesias]
  • Shogun Missionary & Shogun in Education [Saurabh Mahindre; Heiko Strathmann]
  • Testing and Measuring Variable Interactions With Kernels [Soumyajit De; Dino Sejdinovic, Heiko Strathmann]
  • Variational Learning for Gaussian Processes [Wu Lin; Heiko Strathmann, Emtiyaz Khan]

It also contains several cleanups and bugfixes:


  • New Shogun project description [Heiko Strathmann]
  • ID3 algorithm for decision tree learning [Parijat Mazumdar]
  • New modes for PCA matrix factorizations: SVD & EVD, in-place or reallocating [Parijat Mazumdar]
  • Add Neural Networks with linear, logistic and softmax neurons [Khaled Nasr]
  • Add kernel multiclass strategy examples in multiclass notebook [Saurabh Mahindre]
  • Add decision trees notebook containing examples for ID3 algorithm [Parijat Mazumdar]
  • Add sudoku recognizer ipython notebook [Alejandro Hernandez]
  • Add in-place subsets on features, labels, and custom kernels [Heiko Strathmann]
  • Add Principal Component Analysis notebook [Abhijeet Kislay]
  • Add Multiple Kernel Learning notebook [Saurabh Mahindre]
  • Add Multi-Label classes to enable Multi-Label classification [Thoralf Klein]
  • Add rectified linear neurons, dropout and max-norm regularization to neural networks [Khaled Nasr]
  • Add C4.5 algorithm for multiclass classification using decision trees [Parijat Mazumdar]
  • Add support for arbitrary acyclic graph-structured neural networks [Khaled Nasr]
  • Add CART algorithm for classification and regression using decision trees [Parijat Mazumdar]
  • Add CHAID algorithm for multiclass classification and regression using decision trees [Parijat Mazumdar]
  • Add Convolutional Neural Networks [Khaled Nasr]
  • Add Random Forests algorithm for ensemble learning using CART [Parijat Mazumdar]
  • Add Restricted Botlzmann Machines [Khaled Nasr]
  • Add Stochastic Gradient Boosting algorithm for ensemble learning [Parijat Mazumdar]
  • Add Deep contractive and denoising autoencoders [Khaled Nasr]
  • Add Deep belief networks [Khaled Nasr]


  • Fix reference counting bugs in CList when reference counting is on [Heiko Strathmann, Thoralf Klein, lambday]
  • Fix memory problem in PCA::apply_to_feature_matrix [Parijat Mazumdar]
  • Fix crash in LeastAngleRegression for the case D greater than N [Parijat Mazumdar]
  • Fix memory violations in bundle method solvers [Thoralf Klein]
  • Fix fail in library_mldatahdf5.cpp example when is not working properly [Parijat Mazumdar]
  • Fix memory leaks in Vowpal Wabbit, LibSVMFile and KernelPCA [Thoralf Klein]
  • Fix memory and control flow issues discovered by Coverity [Thoralf Klein]
  • Fix R modular interface SWIG typemap (Requires SWIG >= 2.0.5) [Matt Huska]

Cleanup and API Changes

  • PCA now depends on Eigen3 instead of LAPACK [Parijat Mazumdar]
  • Removing redundant and fixing implicit imports [Thoralf Klein]
  • Hide many methods from SWIG, reducing compile memory by 500MiB [Heiko Strathmann, Fernando Iglesias, Thoralf Klein]

Logo Boosted Decision Trees and Lists 1.0.4

by melamed - July 25, 2014, 23:08:32 CET [ BibTeX Download ] 4962 views, 1508 downloads, 3 subscriptions

About: Boosting algorithms for classification and regression, with many variations. Features include: Scalable and robust; Easily customizable loss functions; One-shot training for an entire regularization path; Continuous checkpointing; much more

  • added ElasticNets as a regularization option
  • fixed some segfaults, memory leaks, and out-of-range errors, which were creeping in in some corner cases
  • added a couple of I/O optimizations

Logo JMLR MultiBoost 1.2.02

by busarobi - March 31, 2014, 16:13:04 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 32448 views, 5488 downloads, 1 subscription

About: MultiBoost is a multi-purpose boosting package implemented in C++. It is based on the multi-class/multi-task AdaBoost.MH algorithm [Schapire-Singer, 1999]. Basic base learners (stumps, trees, products, Haar filters for image processing) can be easily complemented by new data representations and the corresponding base learners, without interfering with the main boosting engine.


Major changes :

  • The “early stopping” feature can now based on any metric output with the --outputinfo command line argument.

  • Early stopping now works with --slowresume command line argument.

Minor fixes:

  • More informative output when testing.

  • Various compilation glitch with recent clang (OsX/Linux).

Logo LIBOL 0.3.0

by stevenhoi - December 12, 2013, 15:26:14 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 12861 views, 4594 downloads, 2 subscriptions

About: LIBOL is an open-source library with a family of state-of-the-art online learning algorithms for machine learning and big data analytics research. The current version supports 16 online algorithms for binary classification and 13 online algorithms for multiclass classification.


In contrast to our last version (V0.2.3), the new version (V0.3.0) has made some important changes as follows:

• Add a template and guide for adding new algorithms;

• Improve parameter settings and make documentation clear;

• Improve documentation on data formats and key functions;

• Amend the "OGD" function to use different loss types;

• Fixed some name inconsistency and other minor bugs.

Logo KMLib sparse GPU SVM 0.1

by ksopyla - March 20, 2013, 14:30:08 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2951 views, 737 downloads, 1 subscription

About: Support Vectors Machine library in .net with CUDA support. Library includes GPU SVM solver for kernels linear,RBF,Chi-Square and Exp Chi-Square which use NVIDIA CUDA technology. It allows for classification of feature rich sparse datasets through utilization of sparse matrix formats CSR, Ellpack-R or Sliced EllR-T


Initial Announcement on

Logo pGBRT, Parallel Gradient Boosted Regression Trees 0.9

by swtyree - September 16, 2011, 22:15:46 CET [ Project Homepage BibTeX Download ] 8372 views, 1375 downloads, 1 subscription

About: Learns gradient boosted regression tree ensembles in parallel on shared memory or cluster systems


Initial Announcement on

Logo mldata-utils 0.5.0

by sonne - April 8, 2011, 10:02:44 CET [ Project Homepage BibTeX Download ] 25283 views, 5444 downloads, 1 subscription

About: Tools to convert datasets from various formats to various formats, performance measures and API functions to communicate with

  • Change task file format, such that data splits can have a variable number items and put into up to 256 categories of training/validation/test/not used/...
  • Various bugfixes.

Logo redsvd 0.1.0

by hillbig - August 30, 2010, 18:13:55 CET [ Project Homepage BibTeX Download ] 4839 views, 1071 downloads, 1 subscription

About: redsvd is a library for solving several matrix decomposition (SVD, PCA, eigen value decomposition) redsvd can handle very large matrix efficiently, and optimized for a truncated SVD of sparse matrices. For example, redsvd can compute a truncated SVD with top 20 singular values for a 100K x 100K matrix with 10M nonzero entries in about two second.


Initial Announcement on

Logo sofia ml 0.1

by dsculley - December 29, 2009, 23:30:58 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 6048 views, 1074 downloads, 0 comments, 1 subscription

About: A fast implementation of several stochastic gradient descent learners for classification, ranking, and ROC area optimization, suitable for large, sparse data sets. Includes Pegasos SVM, SGD-SVM, Passive-Aggressive Perceptron, Perceptron with Margins, Logistic Regression, and ROMMA. Commandline utility and API libraries are provided.


Initial Announcement on

Logo Elefant 0.4

by kishorg - October 17, 2009, 08:48:19 CET [ Project Homepage BibTeX Download ] 19650 views, 8089 downloads, 2 subscriptions

Rating Whole StarWhole Star1/2 StarEmpty StarEmpty Star
(based on 2 votes)

About: Elefant is an open source software platform for the Machine Learning community licensed under the Mozilla Public License (MPL) and developed using Python, C, and C++. We aim to make it the platform [...]


This release contains the Stream module as a first step in the direction of providing C++ library support. Stream aims to be a software framework for the implementation of large scale online learning algorithms. Large scale, in this context, should be understood as something that does not fit in the memory of a standard desktop computer.

Added Bundle Methods for Regularized Risk Minimization (BMRM) allowing to choose from a list of loss functions and solvers (linear and quadratic).

Added the following loss classes: BinaryClassificationLoss, HingeLoss, SquaredHingeLoss, ExponentialLoss, LogisticLoss, NoveltyLoss, LeastMeanSquareLoss, LeastAbsoluteDeviationLoss, QuantileRegressionLoss, EpsilonInsensitiveLoss, HuberRobustLoss, PoissonRegressionLoss, MultiClassLoss, WinnerTakesAllMultiClassLoss, ScaledSoftMarginMultiClassLoss, SoftmaxMultiClassLoss, MultivariateRegressionLoss

Graphical User Interface provides now extensive documentation for each component explaining state variables and port descriptions.

Changed saving and loading of experiments to XML (thereby avoiding storage of large input data structures).

Unified automatic input checking via new static typing extending Python properties.

Full support for recursive composition of larger components containing arbitrary statically typed state variables.

Logo Dirichlet Forest LDA 0.1.1

by davidandrzej - July 16, 2009, 21:59:53 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 5541 views, 1146 downloads, 1 subscription

About: This software implements the Dirichlet Forest (DF) Prior within the Latent Dirichlet Allocation (LDA) model. When combined with LDA, the Dirichlet Forest Prior allows the user to encode domain knowledge (must-links and cannot-links between words) into the prior on topic-word multinomials.


Initial Announcement on

Logo LibSGDQN 1.1

by antojne - July 2, 2009, 15:02:44 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 8069 views, 1664 downloads, 1 subscription

About: LibSGDQN proposes an implementation of SGD-QN, a carefully designed quasi-Newton stochastic gradient descent solver for linear SVMs.


small bug fix (thx nicolas ;)

Logo OLaRankGreedy 1.0

by antojne - June 24, 2009, 17:07:57 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 5058 views, 1085 downloads, 1 subscription

About: OLaRankGreedy is an online solver of the dual formulation of support vector machines for sequence labeling using greedy inference.


Initial Announcement on

Logo OLaRankExact 1.0

by antojne - June 24, 2009, 17:03:48 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 4725 views, 1125 downloads, 1 subscription

About: OLaRank is an online solver of the dual formulation of support vector machines for sequence labeling using viterbi decoding.


Initial Announcement on

Logo BMRM 2.1

by chteo - May 8, 2009, 08:08:20 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 6896 views, 1398 downloads, 1 subscription

About: BMRM is an open source, modular and scalable convex solver for many machine learning problems cast in the form of regularized risk minimization problem.


Initial Announcement on

Logo CoFiRank 0.1

by alexis - March 30, 2009, 17:17:34 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 6098 views, 1258 downloads, 2 subscriptions

About: CoFiRank is a Collaborative Filtering system based on matrix factorization. CoFiRank is based on the idea that it is better to predict the relative order of preferences (ranking) instead of the absolute rating.


Initial Announcement on