-
- Description:
We introduce mlpy, a high-performance Python package for predictive modeling. It makes extensive use of NumPy to provide fast N-dimensional array manipulation and easy integration of C code. Mlpy provides high level procedures that support, with few lines of code, the design of rich Data Analysis Protocols (DAPs) for predictive classification and feature selection. Methods are available for feature weighting and ranking, data resampling, error evaluation and experiment landscaping. The package includes tools to measure stability in sets of ranked feature lists, of special interest in bioinformatics for functional genomics, for which large scale experiments with up to 10^6 classifiers have been run on Linux clusters and on the Grid.
The modular structure of mlpy allows easily adding new algorithms to each of the 7 categories in which the package is organized. They are:
Classification. For each algorithm, distinct methods are deployed for the training and the testing phases (whenever possible, real valued prediction can be obtained). The implemented algorithms are in the families of SVMs-Support Vector Machines (four kernels available), DA-Discriminant Analysis (Fisher, Penalized, Diagonal Linear and Spectral Regression) and Nearest Neighbours.
Feature weighting. A total of nine methods is made available to obtain weights from models such as SVMs or DAs; classifier-independent methods for weighting features are also implemented, including I-RELIEF and Discrete Wavelet Transform.
Feature ranking. Two main schemas are used for selecting and ranking purposes, belonging either to the Recursive Feature Elimination or the Recursive Forward Selection family (for a total of six variants).
Resampling methods. The classification and feature ranking operations can be organized within a sampling procedure such as Textbook/Monte-Carlo cross validation (stratification over labels is available), leave-one-out or user-defined train/test split schema.
Metric functions. Performance assessment can be evaluated by a set of different measures, including Error, Accuracy, Matthews Correlation Coefficient, Area Under the ROC Curve. Variability can assessed by Standard Deviation or Bootstrap Confidence Intervals.
Feature list analysis. The ordered lists from the feature ranking experiments can be analyzed in terms of stability (Canberra indicator, extraction/position indicator) and an optimal list can be retrieved (Borda count).
Landscaping tools. A system of executable scripts to be used off-the-shelf to tabulate performance (e.g. Error, MCC and stability measures) on a grid of different experimental conditions by a basic DAP implementation (resampling by k-fold or Monte Carlo CV, training, feature ranking, test).
mlpy is a project developed by the MPBA research unit at FBK, the Bruno Kessler Foundation in Trento, Italy (http://mpba.fbk.eu).
- Changes to previous version:
Initial Announcement on mloss.org.
- BibTeX Entry: Download
- Supported Operating Systems: Linux, Macosx, Windows, Unix
- Data Formats: None
- Tags: Svm, Classification, Fda, Feature Weighting, Irelief, Rfe, Feature Ranking, Resampling, Srda, Nn, Dwt, Pda, Nips2008, Dlda
- Archive: download here
Comments
-
- jacob Yang (on April 30, 2010, 14:24:11)
- when the program is running, there is no output. I don't know when it will be finish.
-
- Michele Filosi (on December 13, 2011, 10:04:04)
- Very useful and well implemented!
Leave a comment
You must be logged in to post comments.