Project details for EnsembleSVM

Logo JMLR EnsembleSVM 2.0

by claesenm - March 31, 2014, 08:06:20 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (7 today), download ( 2 today ), 4 comments, 2 subscriptions

Description:

EnsembleSVM is an open-source machine learning project. The EnsembleSVM library offers functionality to perform ensemble learning using Support Vector Machine (SVM) base models. In particular, we offer routines for binary ensemble models using SVM base classifiers.

The library enables users to efficiently train models for large data sets. Through a divide-and-conquer strategy, base models are trained on subsets of the data which makes training feasible for large data sets even when using nonlinear kernels. Base models are combined into ensembles with high predictive performance through a bagging strategy. Experimental results have shown the predictive performance to be comparable with standard SVM models but with drastically reduced training time.

For more information, please refer to our website which contains a detailed manual of all available tools and some use cases to get familiar with the software.

Useful links:

  1. EnsembleSVM homepage
  2. EnsembleSVM @ GitHub
  3. EnsembleSVM GitHub wiki
Changes to previous version:

The library has been updated and features a variety of new functionality as well as more efficient implementations of original features. The following key improvements have been made:

  1. Support for multithreading in training and prediction with ensemble models. Since both of these are embarassingly parallel, this has induced a significant speedup (3-fold on quad-core).
  2. Extensive programming framework for aggregation of base model predictions which allows highly efficient prototyping of new aggregation approaches. Additionally we provide several predefined strategies, including (weighted) majority voting, logistic regression and nonlinear SVMs of your choice -- be sure to check out the esvm-edit tool! The provided framework also allows you to efficiently program your own, novel aggregation schemes.
  3. Full code transition to C++11, the latest C++ standard, which enabled various performance improvements. The new release requires moderately recent compilers, such as gcc 4.7.2+ or clang 3.2+.
  4. Generic implementations of convenient facilities have been added, such as thread pools, deserialization factories and more.

The API and ABI have undergone significant changes, many of which are due to the transition to C++11.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
URL: Project Homepage
JMLR MLOSS PaperURL: JMLR-MLOSS Paper Homepage
Supported Operating Systems: Linux, Posix, Mac Os X, Windows Under Cygwin, Ubuntu
Data Formats: Csv, Libsvm Format
Tags: Support Vector Machine, Kernel, Classification, Large Scale Learning, Libsvm, Bagging, Ensemble Learning
Archive: download here

Other available revisons

Version Changelog Date
2.0

The library has been updated and features a variety of new functionality as well as more efficient implementations of original features. The following key improvements have been made:

  1. Support for multithreading in training and prediction with ensemble models. Since both of these are embarassingly parallel, this has induced a significant speedup (3-fold on quad-core).
  2. Extensive programming framework for aggregation of base model predictions which allows highly efficient prototyping of new aggregation approaches. Additionally we provide several predefined strategies, including (weighted) majority voting, logistic regression and nonlinear SVMs of your choice -- be sure to check out the esvm-edit tool! The provided framework also allows you to efficiently program your own, novel aggregation schemes.
  3. Full code transition to C++11, the latest C++ standard, which enabled various performance improvements. The new release requires moderately recent compilers, such as gcc 4.7.2+ or clang 3.2+.
  4. Generic implementations of convenient facilities have been added, such as thread pools, deserialization factories and more.

The API and ABI have undergone significant changes, many of which are due to the transition to C++11.

October 3, 2013, 17:22:42
1.2

Fixed bug in IndexedFile, which caused esvm-train to fail when used without bootstrap mask. Library API/ABI remain unchanged, library revision increased.

March 30, 2013, 14:04:13
1.1

Removed deprecated command line argument related to cross-validation from split-data tool. Library API and ABI remain unchanged.

March 25, 2013, 23:05:36
1.0

Initial Announcement on mloss.org.

March 22, 2013, 14:16:55

Comments

Krzysztof Sopyla (on March 27, 2013, 22:42:18)

Hi,

Could you elaborate on ensemble methods which was implemented in this software? Is it averaging or majority voting? How many base classifiers are created for particular problem?

Maybe this information's are available in article?

Thanks in advanced

Marc Claesen (on March 27, 2013, 22:49:31)

Hi Krzysztof,

Currently aggregation is performed through majority voting, but future releases will feature more flexibility in this regard.

The software itself is versatile and can be used for all sorts of learning tasks. The optimal amount (and size) of base classifiers depends on the problem. In the use cases listed on our home page, you may find some complete examples.

EnsembleSVM enables users to perform cross-validation, which is useful to tune SVM parameters but also to find optimal base classifiers so you may find this interesting.

Generally, using more base classifiers will not degrade predictive performance but it will bloat the models and consequently reduce prediction speed.

Best regards,

Marc Claesen

vu ha (on June 27, 2014, 02:29:36)

Hi Marc,

do you have instruction/example on using RESVM for learning with positives & unlabeled data as described in this paper: http://arxiv.org/pdf/1402.3144v1.pdf (A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base Models)?

Thanks, Vu~

Marc Claesen (on July 3, 2014, 08:35:44)

Dear Vu Ha,

You can find a Python example (including a data set based on MNIST with label noise) in the following Github repository: https://github.com/claesenm/resvm

This repository should be updated soon with more examples and a Python implementation that does not require EnsembleSVM (though the current one will remain available).

Best regards, Marc

Leave a comment

You must be logged in to post comments.