Project details for Mulan

Logo Mulan mulan-1.1.0

by lefman - April 7, 2010, 11:32:25 CET [ Project Homepage BibTeX Download ]

view (10 today), download ( 0 today ), 2 subscriptions


Mulan is an open-source Java library for learning from multi-label datasets. Multi-label datasets consist of training examples of a target function that has multiple binary target variables. This means that each item of a multi-label dataset can be a member of multiple categories or annotated by many labels (classes). This is actually the nature of many real world problems such as semantic annotation of images and video, web page categorization, direct marketing, functional genomics and music categorization into genres and emotions. An introduction on mining multi-label data is provided in (Tsoumakas et al., 2010).

Currently, the library includes a variety of state-of-the-art algorithms for performing the following major multi-label learning tasks:

  • Classification. This task is concerned with outputting a bipartition of the labels into relevant and irrelevant ones for a given input instance.
  • Ranking. This task is concerned with outputting an ordering of the labels, according to their relevance for a given data item
  • Classification and ranking. A combination of the two tasks mentioned-above.

In addition, the library offers the following features:

  • Feature selection. Simple baseline methods are currently supported.
  • Evaluation. Classes that calculate a large variety of evaluation measures through hold-out evaluation and cross-validation.

As already mentioned, Mulan is a library. As such, it offers only programmatic API to the library users. There is no graphical user interface (GUI) available. The possibility to use the library via command line, is also currently not supported. The Getting Started page in the Documentation section is the ideal place to start exploring Mulan.


Tsoumakas, G., Katakis, I., Vlahavas, I. (2010) "Mining Multi-label Data", Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2010.

Changes to previous version:

Initial Announcement on

BibTeX Entry: Download
URL: Project Homepage
Supported Operating Systems: Platform Independent
Data Formats: Arff
Tags: Classification, Multilabel, Ranking, Icml2010, Multi Label
Archive: download here

Other available revisons

Version Changelog Date


  • Added the MLCSSP algorithm (from ICML 2013)
  • Enhancements of multi-target regression capabilities
  • Improved CLUS support
  • Added pairwise classifier and pairwise transformation


  • Providing training data in the Evaluator is unnecessary in the case of specific measures.
  • Examples with missing ground truth are not skipped for measures that handle missing values.
  • Added logistics and squared error losses and measures

Bug fixes

  • IndexOutOfBounds in calculation of MiAP and GMiAP
  • Bug fix in
  • When in rank/score mode the meta-data contained additional unecessary attributes. (Newton Spolaor)

API changes

  • Upgrade to Java 7
  • Upgrade to Weka 3.7.10


  • Small changes and improvements in the wrapper classes for the CLUS library
  • (new experiment)
  • Enumeration is now used for specifying the type of meta-data. (Newton Spolaor)
February 23, 2015, 21:19:05


  • improved data handling that avoids copying the entire input space, leading to important speedups in case of large datasets and very large number of labels.
  • updated technical information, added a check for the case where the number of labels is less or equal than the size of the subset.
  • now checks whether the number of instances is less than the number of requested nearest neighbors.
  • Addition of, an explicit implementation of AdaBoost.MH as combination of AdaBoostM1 and IncludeLabelsClassifier.
  • Addition of, the Multi Label Probabilistic Threshold Optimizer (MLTPTO) thresholding technique.
  • Addition of, an approximate method for the maximization of example-based F-measure.


  • Addition of Specificity measure (example-based, micro/macro label-based)
  • Addition of Mean Average Interpolated Precision (MAiP), Geometric Mean Average Precision (GMAP), Geometric Mean Average Interpolated Precision (GMAiP).
  • New methods for stratified multi-label evaluation.
  • Added support for outputting per label results for all measures that implement the MacroAverageMeasure interface.
  • Simplifying the "strictness" issue of information retrieval measures, by adopting specific assumptions (outlined in the new class to handle special cases, instead of the less clear and useful solution of outputting NaN and the less realistic solution or ignoring special cases.

Bug fixes

  • Bug fix in
  • Bug fix in
  • Bug-fix in
  • Fix for bug occurring when loading the XSD for mulan data outside the command-line environment (e.g. web applications).
  • Javadoc comment updates.

API changes

  • Upgrade to Java 1.6
  • Upgrade to JUnit 4.10
  • Upgrade to Weka 3.7.6.


  • Meaningful messages are now shown when a DataLoadException is thrown.
  • PT6( renamed to
  • MultiLabelInstances now support serialization, as needed by the improved binary relevance transformation.
  • updated according to latest BR improvements.
August 1, 2012, 09:49:21


  • New algorithms added in the meta package.
  • EnsembleOfClassifierChains: The final confidences can now be computed not only by averaging votes, but also by averaging confidences. The option of sampling with replacement was added.
  • MMP: updated with loss functions. Added possibility to specify number of training epochs for MMPLearner.
  • BinaryRelevance: Added method to get the model built for a label.
  • Update to the lazy package: Euclidean is still the default distance function, the option to use a different distance function is given.


  • Introduced loss functions package.
  • Refurbished the measures package so that the measure hierarchy has cleaner semantics and takes loss functions into consideration.
  • Strict/nostrict evaluation (handles divisions by zero differently).
  • Uniform calculation of f-measure for all related measures.

Bug fixes

  • Bug fix in the dimensionality reduction package.
  • Bug fix in CalibratedLabelRanking class.
  • Updated design and bug fixes in thresholding strategies.
  • Fixed defect in MMPUniformUpdateRule.
  • Bug fix in the getPriors method.

API changes

  • Upgrade to Weka 3.7.3.


  • Experiment from ICTAI 2010 paper added.


  • Simplified source examples for consistency with the online documentation.
  • Added an example that shows storing/loading a multi-label model.

Unit Tests

  • HOMER and HMC tests added.
  • MetaLabeler and ThresholdPrediction test updated.
May 31, 2011, 16:14:46
  • Classifiers

-New algorithms:

--Classifier Chain

--Ensemble of Classifier Chains

--Pruned Sets

--Ensemble of Pruned Sets

-New common ancestor class for PPT and PrunedSets: LabelsetPruning

-Modified neural model and learners to allow custom seed for randomness due to testing needs.

-Normalization on MLkNN turned on by default

-HierarchyBuilder: removed repetition of check about number of labels and partitions, now allows equal number of labels and partitions

  • Measures

-New measures:

--AUC evaluation measure (micro/macro)added

--MAP (Mean Average Precision) measure added

-Added a base class for measures calculated based on confidences

-Added a method to get per label Average Precision

-Added support for obtaining copies of Measures

-Results output precision reduced to 4 decimal places

-Added support for incremental addition of evaluation results

-Added a method to retrieve the mean value of a measure (for parameter tuning in experiments)

  • Experiments

-Added a new package for posting code that ensures the reproducibility of empirical work in research papers

-Added a base class for all experiment classes

-3 experiment classes added:




  • Thresholding strategies

-Added new package for thresholding approaches

-Added new thresholding strategies:




--Instance-based thresholding strategies

  • Bugfixes

-Fixed bug in TrainTestExperiment

-Fixed bug when cross-validating with a custom set of measures

-Fixed defects in BPMLL and MMP learners causing them to fail on genbase data set

-Updating label indices when data set attributes indices change due to nominal->binary filter

  • Cleanup – API Changes

-Removed cobertura library for test coverage generation.

-Introduced EMMA library for test coverage generation.

-Removed inclusion of test data in distribution package.

-AttributeSelection package renamed to DimensionalityReduction

July 21, 2010, 09:10:27

Initial Announcement on

June 24, 2010, 05:58:03

Initial Announcement on

April 7, 2010, 11:32:25


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.