Project details for Mulan

Logo JMLR Mulan 1.3.0

by lefman - January 19, 2012, 12:22:35 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions

Description:

Mulan is an open-source Java library for learning from multi-label datasets. Multi-label datasets consist of training examples of a target function that has multiple binary target variables. This means that each item of a multi-label dataset can be a member of multiple categories or annotated by many labels (classes). This is actually the nature of many real world problems such as semantic annotation of images and video, web page categorization, direct marketing, functional genomics and music categorization into genres and emotions. An introduction on mining multi-label data is provided in (Tsoumakas et al., 2010).

Currently, the library includes a variety of state-of-the-art algorithms for performing the following major multi-label learning tasks:

  • Classification. This task is concerned with outputting a bipartition of the labels into relevant and irrelevant ones for a given input instance.
  • Ranking. This task is concerned with outputting an ordering of the labels, according to their relevance for a given data item
  • Classification and ranking. A combination of the two tasks mentioned-above.

In addition, the library offers the following features:

  • Feature selection. Simple baseline methods are currently supported.
  • Evaluation. Classes that calculate a large variety of evaluation measures through hold-out evaluation and cross-validation.

As already mentioned, Mulan is a library. As such, it offers only programmatic API to the library users. There is no graphical user interface (GUI) available. The possibility to use the library via command line, is also currently not supported. The Getting Started page in the Documentation section is the ideal place to start exploring Mulan.

References

Tsoumakas, G., Katakis, I., Vlahavas, I. (2010) "Mining Multi-label Data", Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2010.

Changes to previous version:

Learners

  • New algorithms added in the meta package.
  • EnsembleOfClassifierChains: The final confidences can now be computed not only by averaging votes, but also by averaging confidences. The option of sampling with replacement was added.
  • MMP: updated with loss functions. Added possibility to specify number of training epochs for MMPLearner.
  • BinaryRelevance: Added method to get the model built for a label.
  • Update to the lazy package: Euclidean is still the default distance function, the option to use a different distance function is given.

Measures

  • Introduced loss functions package.
  • Refurbished the measures package so that the measure hierarchy has cleaner semantics and takes loss functions into consideration.
  • Strict/nostrict evaluation (handles divisions by zero differently).
  • Uniform calculation of f-measure for all related measures.

Bug fixes

  • Bug fix in the dimensionality reduction package.
  • Bug fix in CalibratedLabelRanking class.
  • Updated design and bug fixes in thresholding strategies.
  • Fixed defect in MMPUniformUpdateRule.
  • Bug fix in the getPriors method.

API changes

  • Upgrade to Weka 3.7.3.

Experiments

  • Experiment from ICTAI 2010 paper added.

Examples

  • Simplified source examples for consistency with the online documentation.
  • Added an example that shows storing/loading a multi-label model.

Unit Tests

  • HOMER and HMC tests added.
  • MetaLabeler and ThresholdPrediction test updated.
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Platform Independent
Data Formats: Arff
Tags: Classification, Multilabel, Ranking, Icml2010, Multi Label, Jmlr
Archive: download here

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.