Project details for MOA Massive Online Analysis

Screenshot JMLR MOA Massive Online Analysis June-09

by abifet - June 4, 2010, 14:05:31 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions

Description:

MOA is an open-source framework for dealing with massive evolving data streams. MOA is related to WEKA, the Waikato Environment for Knowledge Analysis.

A data stream environment has different requirements from the traditional batch learning setting. The most significant are the following:

  1. Process an example at a time, and inspect it only once (at most)
  2. Use a limited amount of memory
  3. Work in a limited amount of time
  4. Be ready to predict at any time

In traditional batch learning the problem of limited data is overcome by analyzing and averaging multiple models produced with different random arrangements of training and test data. In the stream setting the problem of (effectively) unlimited data poses different challenges. MOA permits evaluation of data stream classification algorithms on large streams, in the order of tens of millions of examples where possible, and under explicit memory limits. Any less than this does not actually test algorithms in a realistically challenging setting.

MOA is written in Java. The main benefits of Java are portability, where applications can be run on any platform with an appropriate Java virtual machine, and the strong and well-developed support libraries. Use of the language is widespread, and features such as the automatic garbage collection help to reduce programmer burden and error.

MOA contains stream generators, classifiers and evaluation methods.

Considering data streams as data generated from pure distributions, MOA models a concept drift {event} as a weighted combination of two pure distributions that characterizes the target concepts before and after the drift. Within the framework, it is possible to define the probability that instances of the stream belong to the new concept after the drift. It uses the sigmoid function, as an elegant and practical solution.

MOA contains the data generators most commonly found in the literature. MOA streams can be built using generators, reading ARFF files, joining several streams, or filtering streams. They allow the simulation of a potentially infinite sequence of data. The following generators are currently available: Random Tree Generator, SEA Concepts Generator, STAGGER Concepts Generator, Rotating Hyperplane, Random RBF Generator, LED Generator, Waveform Generator, and Function Generator.

MOA contains several classifier methods such as: Naive Bayes, Decision Stump, Hoeffding Tree, Hoeffding Option Tree, Bagging, Boosting, Bagging using ADWIN, and Bagging using Adaptive-Size Hoeffding Trees.

[1] Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. New ensemble methods for evolving data streams. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009b.

[2] Richard Kirkby. Improving Hoeffding Trees. PhD thesis, University of Waikato, November 2007.

[3] Bernhard Pfahringer, Geoff Holmes, and Richard Kirkby. Handling numeric attributes in hoeffding trees. In PAKDD Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 296–307, 2008.

Changes to previous version:

Initial Announcement on mloss.org.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Cygwin, Linux, Macosx, Windows
Data Formats: Arff
Tags: Classification, Online Learning, Boosting, Weka, Bagging, Data Streams, Ensemble Methods
Archive: download here

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.