Project details for Pyriel

Screenshot Pyriel 1.0

by tfawcett - March 14, 2010, 03:51:18 CET [ BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions

Description:

Pyriel is an experimental rule learning system written in Python. Given a set of data labeled with class names, it will learn a set of data classification rules of the form:

If condition1 AND condition2 AND ... AND conditionN ==> CLASS

Pyriel has a number of desirable properties for data mining:

  • Because PRIE maximizes ROC performance, it naturally handles skewed datasets.

  • PRIE is able to handle multiple classes. It will attempt to optimize the combined AUC for any number of classes simultaneously.

  • PRIE's output is a single rulelist and thus is relatively intelligible and modular. To use this rulelist on a new unseen instance, the rules are evaluated sequentially and the first one matching determines the class and probability. Some data mining practitioners consider rulelists to be more intelligible than rulesets because only a single rule matches a new instance.

  • Because PRIE uses a rulelist whose rules are ordered decreasing by class likelihood, the rulelist may be used naturally with the ROC convex hull (Provost and Fawcett, 2001). In use, if operating conditions (class skew and relative error costs) are known, the rulelist can be truncated to eliminate rules that will never affect a classification decision.

  • PRIE handles numerical attributes naturally, using the ROC curve implicitly to identify promising discretizations. Other classification models may discretize variables in a preprocessing pass or may use techniques unrelated to model construction. PRIE considers every discretization of a continuous attribute to comprise a separate point in ROC space, and handles these the same as any other discrete attribute.

  • PRIE can handle set-valued attributes (Cohen, 1996), in which an attribute of an instance may take on a set of discrete values instead of a single one. Such features are useful, for example, in text classification domains in which the set may represent the "bag of words" of a text document.

PRIE is unusual in that it uses basic principles from rule learning and computational geometry to focus the search for promising rule combinations. The result is a system that can learn rulelists with high AUC scores.

Changes to previous version:

Initial Announcement on mloss.org.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Platform Independent
Data Formats: Ascii, Arff
Tags: Classification, Roc, Rule Learning, Scoring
Archive: download here

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.