-
- Description:
Pyriel is an experimental rule learning system written in Python. Given a set of data labeled with class names, it will learn a set of data classification rules of the form:
If condition1 AND condition2 AND ... AND conditionN ==> CLASS
Pyriel has a number of desirable properties for data mining:
Because Pyriel maximizes ROC performance, it naturally handles skewed datasets.
Pyriel is able to handle multiple classes. It will attempt to optimize the combined AUC for any number of classes simultaneously.
Pyriel's output is a single rulelist and thus is relatively intelligible and modular. To use this rulelist on a new unseen instance, the rules are evaluated sequentially and the first one matching determines the class and probability. Some data mining practitioners consider rulelists to be more intelligible than rulesets because only a single rule matches a new instance.
Because Pyriel uses a rulelist whose rules are ordered decreasing by class likelihood, the rulelist may be used naturally with the ROC convex hull (Provost and Fawcett, 2001). In use, if operating conditions (class skew and relative error costs) are known, the rulelist can be truncated to eliminate rules that will never affect a classification decision.
Pyriel handles numerical attributes naturally, using the ROC curve implicitly to identify promising discretizations. Other classification models may discretize variables in a preprocessing pass or may use techniques unrelated to model construction. Pyriel considers every discretization of a continuous attribute to comprise a separate point in ROC space, and handles these the same as any other discrete attribute.
Pyriel can handle set-valued attributes (Cohen, 1996), in which an attribute of an instance may take on a set of discrete values instead of a single one. Such features are useful, for example, in text classification domains in which the set may represent the "bag of words" of a text document.
Pyriel is unusual in that it uses basic principles from rule learning and computational geometry to focus the search for promising rule combinations. The result is a system that can learn rulelists with high AUC scores.
- Changes to previous version:
1.5 Changed CF (confidence factor) to do LaPlace smoothing of estimates. New flag "--score-for-class C" causes scores to be computed relative to a given (positive) class. For two-class problems. Fixed bug in example sampling code (--sample n) Fixed bug keeping old-style example formats (terminated by dot) from working. More code restructuring.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Ascii, Arff
- Tags: Classification, Roc, Rule Learning, Scoring
- Archive: download here
Other available revisons
-
Version Changelog Date 1.5 1.5 Changed CF (confidence factor) to do LaPlace smoothing of estimates. New flag "--score-for-class C" causes scores to be computed relative to a given (positive) class. For two-class problems. Fixed bug in example sampling code (--sample n) Fixed bug keeping old-style example formats (terminated by dot) from working. More code restructuring.
October 27, 2010, 09:12:53 1.4 1.4 Many bug fixes. Made reader more robust. Complete rewrite of continuous attribute discretization code, fixing some persistent bugs. Restructured code so that when pyriel is installed via setup.py all modules end up under ./dist-packages/pyriel instead of scattered in dist-packages. Removed .py from installed scripts
October 17, 2010, 05:12:24 1.3 1.4 Many bug fixes. Made reader more robust. Complete rewrite of continuous attribute discretization code, fixing some persistent bugs. Restructured code so that when pyriel is installed via setup.py all modules end up under ./dist-packages/pyriel instead of scattered in dist-packages. Removed .py from installed scripts
August 29, 2010, 06:24:31 1.2 Fixed SetAttr methods in Attr.py that were keeping set attributes from working (thanks Adler Perotte). Added warning message to Read.py
April 18, 2010, 08:40:12 1.1 Fixed SetAttr methods in Attr.py that were keeping set attributes from working (thanks Adler Perotte). Added warning message to Read.py
April 7, 2010, 23:59:32 1.0 Initial Announcement on mloss.org.
March 14, 2010, 03:51:18
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.