Pyrielhttp://mloss.orgUpdates and additions to PyrielenWed, 27 Oct 2010 09:12:53 -0000Pyriel 1.5<html><p>Pyriel is an experimental rule learning system written in Python. Given a set of data labeled with class names, it will learn a set of data classification rules of the form: </p> <p> If condition1 AND condition2 AND ... AND conditionN ==&gt; CLASS </p> <p>Pyriel has a number of desirable properties for data mining: </p> <ul> <li><p>Because Pyriel maximizes ROC performance, it naturally handles skewed datasets. </p> </li> <li><p>Pyriel is able to handle multiple classes. It will attempt to optimize the combined AUC for any number of classes simultaneously. </p> </li> <li><p>Pyriel's output is a single rulelist and thus is relatively intelligible and modular. To use this rulelist on a new unseen instance, the rules are evaluated sequentially and the first one matching determines the class and probability. Some data mining practitioners consider rulelists to be more intelligible than rulesets because only a single rule matches a new instance. </p> </li> <li><p>Because Pyriel uses a rulelist whose rules are ordered decreasing by class likelihood, the rulelist may be used naturally with the ROC convex hull (Provost and Fawcett, 2001). In use, if operating conditions (class skew and relative error costs) are known, the rulelist can be truncated to eliminate rules that will never affect a classification decision. </p> </li> <li><p>Pyriel handles numerical attributes naturally, using the ROC curve implicitly to identify promising discretizations. Other classification models may discretize variables in a preprocessing pass or may use techniques unrelated to model construction. Pyriel considers every discretization of a continuous attribute to comprise a separate point in ROC space, and handles these the same as any other discrete attribute. </p> </li> <li><p>Pyriel can handle set-valued attributes (Cohen, 1996), in which an attribute of an instance may take on a set of discrete values instead of a single one. Such features are useful, for example, in text classification domains in which the set may represent the "bag of words" of a text document. </p> </li> </ul> <p>Pyriel is unusual in that it uses basic principles from rule learning and computational geometry to focus the search for promising rule combinations. The result is a system that can learn rulelists with high AUC scores. </p></html>tom fawcettWed, 27 Oct 2010 09:12:53 -0000 learningscoring