KeplerWeka 20101008

Peter Reutemann — Sat, 09 Oct 2010 05:27:13 -0000

KeplerWeka represents the integration of all the functionality of the WEKA Machine Learning Workbench [1] into the open-source scientific workflow Kepler [2]. Among them are classification, clustering, attribute selection and association rules. Data can be read from multiple data-sources, pre-processed, visualized (ROC, cost-curves, ...) and also saved in various formats (file or database). Schemes can be evaluated via random splits, cross-validation or dedicated train and test sets. The Weka Experimenter is available in the workflow as a separate component (or "actor" in Kepler terms) as well. In contrast to the standalone application, the Experiment actor is not limited to files stored on disk, but can also be run with data generated in the workflow. Furthermore, one can feed parameter sweeps of classifiers into the Experiment actor as well, using the ClassifierSetupGenerator actor.

The workflow engine Kepler is based on the Ptolemy II system [3] for heterogeneous, concurrent modeling and design. Although Ptolemy II was not originally intended for scientific workflows, it provides a mature platform for building and executing workflows, and supports multiple models of computation.

[1] Ian H. Witten and Eibe Frank (2005) "Data Mining: Practical machine learning tools and techniques", 2nd Edition, Morgan Kaufmann, San Francisco, 2005.

[2] Kepler Project, http://kepler-project.org/

[3] Ptolemy II, http://ptolemy.berkeley.edu/ptolemyII/

mloss.org KeplerWeka

KeplerWeka 20101008