Project details for EnsembleSVM

Logo EnsembleSVM 1.1

by claesenm - March 25, 2013, 23:05:36 CET [ Project Homepage BibTeX Download ]

view ( today), download ( today ), 9 comments, 0 subscriptions


EnsembleSVM is an open-source machine learning project. The EnsembleSVM library offers functionality to perform ensemble learning using Support Vector Machine (SVM) base models. In particular, we offer routines for binary ensemble models using SVM base classifiers.

The library enables users to efficiently train models for large data sets. Through a divide-and-conquer strategy, base models are trained on subsets of the data which makes training feasible for large data sets even when using nonlinear kernels. Base models are combined into ensembles with high predictive performance through a bagging strategy. Experimental results have shown the predictive performance to be comparable with standard SVM models but with drastically reduced training time.

For more information, please refer to our website which contains a detailed manual of all available tools and some use cases to get familiar with the software.

Changes to previous version:

Removed deprecated command line argument related to cross-validation from split-data tool. Library API and ABI remain unchanged.

BibTeX Entry: Download
Supported Operating Systems: Linux, Posix, Mac Os X, Windows Under Cygwin, Ubuntu
Data Formats: Csv, Libsvm Format
Tags: Support Vector Machine, Kernel, Classification, Large Scale Learning, Libsvm, Bagging, Ensemble Learning
Archive: download here


Krzysztof Sopyla (on March 27, 2013, 22:42:18)
Hi, Could you elaborate on ensemble methods which was implemented in this software? Is it averaging or majority voting? How many base classifiers are created for particular problem? Maybe this information's are available in article? Thanks in advanced
Marc Claesen (on March 27, 2013, 22:49:31)
Hi Krzysztof, Currently aggregation is performed through majority voting, but future releases will feature more flexibility in this regard. The software itself is versatile and can be used for all sorts of learning tasks. The optimal amount (and size) of base classifiers depends on the problem. In the use cases listed on our home page, you may find some complete examples. EnsembleSVM enables users to perform cross-validation, which is useful to tune SVM parameters but also to find optimal base classifiers so you may find this interesting. Generally, using more base classifiers will not degrade predictive performance but it will bloat the models and consequently reduce prediction speed. Best regards, Marc Claesen
vu ha (on June 27, 2014, 02:29:36)
Hi Marc, do you have instruction/example on using RESVM for learning with positives & unlabeled data as described in this paper: (A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base Models)? Thanks, Vu~
Marc Claesen (on July 3, 2014, 08:35:44)
Dear Vu Ha, You can find a Python example (including a data set based on MNIST with label noise) in the following Github repository: This repository should be updated soon with more examples and a Python implementation that does not require EnsembleSVM (though the current one will remain available). Best regards, Marc
Girish Ramachandra (on August 2, 2014, 05:06:37)
Hi Marc, Just wanted to get a couple of queries clarified before I try out your RESVM python script: 1) I have my data in the format -- {label,feature_1,...,feature_25}. Can I use commonly available scripts to convert CSV to LIBSVM-format and run your script on it? 2) Any limitations on number of cases? Eg., I have about 31K cases with positive labels, and 200K unlabeled cases. Thanks much! -Girish
Marc Claesen (on August 4, 2014, 11:55:55)
Hi Girish, You can use the *sparse* tool which is included in EnsembleSVM to convert CSV files to LIBSVM format. Here's an example (labels must be in the first column of your CSV file): sparse -data data.csv -o data.libsvm -labeled -delim , There is no hard limit on the number of cases the script can deal with (within reasonable bounds). A couple of million instances should be no problem for the current implementation. Note that the RESVM script is just a reference implementation, e.g. it is not optimized for large-scale use (though this should not pose problems for you). You will need to have EnsembleSVM installed to run the script. Best regards, Marc
Girish Ramachandra (on August 6, 2014, 06:56:44)
Thanks, Marc! It worked just fine for my data. Quick question -- I understand that in case of traditional SVM, the value of the SVM decision function based on the lagrangian multipliers and support vectors should only be interpreted by its sign, i.e., if decision value > 0, then assign label: +1, else assign label: -1. Does that change in your case? Because I did see cases where the decision value was > 0, and the label was -1. Thanks! -Girish
Marc Claesen (on August 6, 2014, 08:51:52)
Hi Girish, The decision values used in the RESVM script are explained in this manuscript: Briefly: the decision values are the fraction of base models that predict positive (the default threshold for positive predictions is therefore 0.5 instead of 0.0). In case of unanimous votes by all base models, we use the sum of the SVM decision values of all base models. Effectively this means that RESVM decision values range from -infinity to +infinity, though they are usually between 0 and 1. Regards, Marc
Ehsan Sadrfaridpour (on July 13, 2016, 16:58:17)
Hi Marc, Have you used any model selection technique for training the models? Best, Ehsan

Leave a comment

You must be logged in to post comments.