Project details for Pyriel

Screenshot Pyriel 1.5

by tfawcett - October 27, 2010, 09:12:53 CET [ BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions


Pyriel is an experimental rule learning system written in Python. Given a set of data labeled with class names, it will learn a set of data classification rules of the form:

If condition1 AND condition2 AND ... AND conditionN ==> CLASS

Pyriel has a number of desirable properties for data mining:

  • Because Pyriel maximizes ROC performance, it naturally handles skewed datasets.

  • Pyriel is able to handle multiple classes. It will attempt to optimize the combined AUC for any number of classes simultaneously.

  • Pyriel's output is a single rulelist and thus is relatively intelligible and modular. To use this rulelist on a new unseen instance, the rules are evaluated sequentially and the first one matching determines the class and probability. Some data mining practitioners consider rulelists to be more intelligible than rulesets because only a single rule matches a new instance.

  • Because Pyriel uses a rulelist whose rules are ordered decreasing by class likelihood, the rulelist may be used naturally with the ROC convex hull (Provost and Fawcett, 2001). In use, if operating conditions (class skew and relative error costs) are known, the rulelist can be truncated to eliminate rules that will never affect a classification decision.

  • Pyriel handles numerical attributes naturally, using the ROC curve implicitly to identify promising discretizations. Other classification models may discretize variables in a preprocessing pass or may use techniques unrelated to model construction. Pyriel considers every discretization of a continuous attribute to comprise a separate point in ROC space, and handles these the same as any other discrete attribute.

  • Pyriel can handle set-valued attributes (Cohen, 1996), in which an attribute of an instance may take on a set of discrete values instead of a single one. Such features are useful, for example, in text classification domains in which the set may represent the "bag of words" of a text document.

Pyriel is unusual in that it uses basic principles from rule learning and computational geometry to focus the search for promising rule combinations. The result is a system that can learn rulelists with high AUC scores.

Changes to previous version:

1.5 Changed CF (confidence factor) to do LaPlace smoothing of estimates. New flag "--score-for-class C" causes scores to be computed relative to a given (positive) class. For two-class problems. Fixed bug in example sampling code (--sample n) Fixed bug keeping old-style example formats (terminated by dot) from working. More code restructuring.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Platform Independent
Data Formats: Ascii, Arff
Tags: Classification, Roc, Rule Learning, Scoring
Archive: download here


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.