-
- Description:
Mulan is an open-source Java library for learning from multi-label datasets. Multi-label datasets consist of training examples of a target function that has multiple binary target variables. This means that each item of a multi-label dataset can be a member of multiple categories or annotated by many labels (classes). This is actually the nature of many real world problems such as semantic annotation of images and video, web page categorization, direct marketing, functional genomics and music categorization into genres and emotions. An introduction on mining multi-label data is provided in (Tsoumakas et al., 2010).
Currently, the library includes a variety of state-of-the-art algorithms for performing the following major multi-label learning tasks:
- Classification. This task is concerned with outputting a bipartition of the labels into relevant and irrelevant ones for a given input instance.
- Ranking. This task is concerned with outputting an ordering of the labels, according to their relevance for a given data item
- Classification and ranking. A combination of the two tasks mentioned-above.
In addition, the library offers the following features:
- Feature selection. Simple baseline methods are currently supported.
- Evaluation. Classes that calculate a large variety of evaluation measures through hold-out evaluation and cross-validation.
As already mentioned, Mulan is a library. As such, it offers only programmatic API to the library users. There is no graphical user interface (GUI) available. The possibility to use the library via command line, is also currently not supported. The Getting Started page in the Documentation section is the ideal place to start exploring Mulan.
References
Tsoumakas, G., Katakis, I., Vlahavas, I. (2010) "Mining Multi-label Data", Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2010.
- Changes to previous version:
Learners
- BinaryRelevance.java: improved data handling that avoids copying the entire input space, leading to important speedups in case of large datasets and very large number of labels.
- RAkEL.java: updated technical information, added a check for the case where the number of labels is less or equal than the size of the subset.
- MultiLabelKNN.java: now checks whether the number of instances is less than the number of requested nearest neighbors.
- Addition of AdaBoostMH.java, an explicit implementation of AdaBoost.MH as combination of AdaBoostM1 and IncludeLabelsClassifier.
- Addition of MLPTO.java, the Multi Label Probabilistic Threshold Optimizer (MLTPTO) thresholding technique.
- Addition of ApproximateExampleBasedFMeasureOptimizer.java, an approximate method for the maximization of example-based F-measure.
Measures/Evaluation
- Addition of Specificity measure (example-based, micro/macro label-based)
- Addition of Mean Average Interpolated Precision (MAiP), Geometric Mean Average Precision (GMAP), Geometric Mean Average Interpolated Precision (GMAiP).
- New methods for stratified multi-label evaluation.
- Added support for outputting per label results for all measures that implement the MacroAverageMeasure interface.
- Simplifying the "strictness" issue of information retrieval measures, by adopting specific assumptions (outlined in the new class InformationRetrievalMeasures.java) to handle special cases, instead of the less clear and useful solution of outputting NaN and the less realistic solution or ignoring special cases.
Bug fixes
- Bug fix in LabelsBuilder.java.
- Bug fix in Ranker.java.
- Bug-fix in ThresholdPrediction.java.
- Fix for bug occurring when loading the XSD for mulan data outside the command-line environment (e.g. web applications).
- Javadoc comment updates.
API changes
- Upgrade to Java 1.6
- Upgrade to JUnit 4.10
- Upgrade to Weka 3.7.6.
Miscellaneous
- Meaningful messages are now shown when a DataLoadException is thrown.
- PT6(PT6Transformation.java): renamed to IncludeLabelsTransformation.java.
- MultiLabelInstances now support serialization, as needed by the improved binary relevance transformation.
- BinaryRelevanceAttributeEvaluator.java: updated according to latest BR improvements.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Arff
- Tags: Classification, Multilabel, Ranking, Icml2010, Multi Label, Jmlr
- Archive: download here
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.