Project details for EnsembleSVM

Logo JMLR EnsembleSVM 2.0

by claesenm - March 31, 2014, 08:06:20 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (8 today), download ( 3 today ), 9 comments, 0 subscriptions

Description:

EnsembleSVM is an open-source machine learning project. The EnsembleSVM library offers functionality to perform ensemble learning using Support Vector Machine (SVM) base models. In particular, we offer routines for binary ensemble models using SVM base classifiers.

The library enables users to efficiently train models for large data sets. Through a divide-and-conquer strategy, base models are trained on subsets of the data which makes training feasible for large data sets even when using nonlinear kernels. Base models are combined into ensembles with high predictive performance through a bagging strategy. Experimental results have shown the predictive performance to be comparable with standard SVM models but with drastically reduced training time.

For more information, please refer to our website which contains a detailed manual of all available tools and some use cases to get familiar with the software.

Useful links:

  1. EnsembleSVM homepage
  2. EnsembleSVM @ GitHub
  3. EnsembleSVM GitHub wiki
Changes to previous version:

The library has been updated and features a variety of new functionality as well as more efficient implementations of original features. The following key improvements have been made:

  1. Support for multithreading in training and prediction with ensemble models. Since both of these are embarassingly parallel, this has induced a significant speedup (3-fold on quad-core).
  2. Extensive programming framework for aggregation of base model predictions which allows highly efficient prototyping of new aggregation approaches. Additionally we provide several predefined strategies, including (weighted) majority voting, logistic regression and nonlinear SVMs of your choice -- be sure to check out the esvm-edit tool! The provided framework also allows you to efficiently program your own, novel aggregation schemes.
  3. Full code transition to C++11, the latest C++ standard, which enabled various performance improvements. The new release requires moderately recent compilers, such as gcc 4.7.2+ or clang 3.2+.
  4. Generic implementations of convenient facilities have been added, such as thread pools, deserialization factories and more.

The API and ABI have undergone significant changes, many of which are due to the transition to C++11.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Linux, Posix, Mac Os X, Windows Under Cygwin, Ubuntu
Data Formats: Csv, Libsvm Format
Tags: Support Vector Machine, Kernel, Classification, Large Scale Learning, Libsvm, Bagging, Ensemble Learning
Archive: download here

Other available revisons

Version Changelog Date
2.0

The library has been updated and features a variety of new functionality as well as more efficient implementations of original features. The following key improvements have been made:

  1. Support for multithreading in training and prediction with ensemble models. Since both of these are embarassingly parallel, this has induced a significant speedup (3-fold on quad-core).
  2. Extensive programming framework for aggregation of base model predictions which allows highly efficient prototyping of new aggregation approaches. Additionally we provide several predefined strategies, including (weighted) majority voting, logistic regression and nonlinear SVMs of your choice -- be sure to check out the esvm-edit tool! The provided framework also allows you to efficiently program your own, novel aggregation schemes.
  3. Full code transition to C++11, the latest C++ standard, which enabled various performance improvements. The new release requires moderately recent compilers, such as gcc 4.7.2+ or clang 3.2+.
  4. Generic implementations of convenient facilities have been added, such as thread pools, deserialization factories and more.

The API and ABI have undergone significant changes, many of which are due to the transition to C++11.

October 3, 2013, 17:22:42
1.2

Fixed bug in IndexedFile, which caused esvm-train to fail when used without bootstrap mask. Library API/ABI remain unchanged, library revision increased.

March 30, 2013, 14:04:13
1.1

Removed deprecated command line argument related to cross-validation from split-data tool. Library API and ABI remain unchanged.

March 25, 2013, 23:05:36
1.0

Initial Announcement on mloss.org.

March 22, 2013, 14:16:55

Comments

Krzysztof Sopyla (on March 27, 2013, 22:42:18)
Hi, Could you elaborate on ensemble methods which was implemented in this software? Is it averaging or majority voting? How many base classifiers are created for particular problem? Maybe this information's are available in article? Thanks in advanced
Marc Claesen (on March 27, 2013, 22:49:31)
Hi Krzysztof, Currently aggregation is performed through majority voting, but future releases will feature more flexibility in this regard. The software itself is versatile and can be used for all sorts of learning tasks. The optimal amount (and size) of base classifiers depends on the problem. In the use cases listed on our home page, you may find some complete examples. EnsembleSVM enables users to perform cross-validation, which is useful to tune SVM parameters but also to find optimal base classifiers so you may find this interesting. Generally, using more base classifiers will not degrade predictive performance but it will bloat the models and consequently reduce prediction speed. Best regards, Marc Claesen
vu ha (on June 27, 2014, 02:29:36)
Hi Marc, do you have instruction/example on using RESVM for learning with positives & unlabeled data as described in this paper: http://arxiv.org/pdf/1402.3144v1.pdf (A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base Models)? Thanks, Vu~
Marc Claesen (on July 3, 2014, 08:35:44)
Dear Vu Ha, You can find a Python example (including a data set based on MNIST with label noise) in the following Github repository: https://github.com/claesenm/resvm This repository should be updated soon with more examples and a Python implementation that does not require EnsembleSVM (though the current one will remain available). Best regards, Marc
Girish Ramachandra (on August 2, 2014, 05:06:37)
Hi Marc, Just wanted to get a couple of queries clarified before I try out your RESVM python script: 1) I have my data in the format -- {label,feature_1,...,feature_25}. Can I use commonly available scripts to convert CSV to LIBSVM-format and run your script on it? 2) Any limitations on number of cases? Eg., I have about 31K cases with positive labels, and 200K unlabeled cases. Thanks much! -Girish
Marc Claesen (on August 4, 2014, 11:55:55)
Hi Girish, You can use the *sparse* tool which is included in EnsembleSVM to convert CSV files to LIBSVM format. Here's an example (labels must be in the first column of your CSV file): sparse -data data.csv -o data.libsvm -labeled -delim , There is no hard limit on the number of cases the script can deal with (within reasonable bounds). A couple of million instances should be no problem for the current implementation. Note that the RESVM script is just a reference implementation, e.g. it is not optimized for large-scale use (though this should not pose problems for you). You will need to have EnsembleSVM installed to run the script. Best regards, Marc
Girish Ramachandra (on August 6, 2014, 06:56:44)
Thanks, Marc! It worked just fine for my data. Quick question -- I understand that in case of traditional SVM, the value of the SVM decision function based on the lagrangian multipliers and support vectors should only be interpreted by its sign, i.e., if decision value > 0, then assign label: +1, else assign label: -1. Does that change in your case? Because I did see cases where the decision value was > 0, and the label was -1. Thanks! -Girish
Marc Claesen (on August 6, 2014, 08:51:52)
Hi Girish, The decision values used in the RESVM script are explained in this manuscript: http://arxiv.org/abs/1402.3144 Briefly: the decision values are the fraction of base models that predict positive (the default threshold for positive predictions is therefore 0.5 instead of 0.0). In case of unanimous votes by all base models, we use the sum of the SVM decision values of all base models. Effectively this means that RESVM decision values range from -infinity to +infinity, though they are usually between 0 and 1. Regards, Marc
Ehsan Sadrfaridpour (on July 13, 2016, 16:58:17)
Hi Marc, Have you used any model selection technique for training the models? Best, Ehsan

Leave a comment

You must be logged in to post comments.