Project details for Pyriel

Screenshot Pyriel 1.0

by tfawcett - March 14, 2010, 03:51:18 CET [ BibTeX BibTeX for corresponding Paper Download ]

view (9 today), download ( 1 today ), 1 subscription

Description:

Pyriel is an experimental rule learning system written in Python. Given a set of data labeled with class names, it will learn a set of data classification rules of the form:

If condition1 AND condition2 AND ... AND conditionN ==> CLASS

Pyriel has a number of desirable properties for data mining:

  • Because PRIE maximizes ROC performance, it naturally handles skewed datasets.

  • PRIE is able to handle multiple classes. It will attempt to optimize the combined AUC for any number of classes simultaneously.

  • PRIE's output is a single rulelist and thus is relatively intelligible and modular. To use this rulelist on a new unseen instance, the rules are evaluated sequentially and the first one matching determines the class and probability. Some data mining practitioners consider rulelists to be more intelligible than rulesets because only a single rule matches a new instance.

  • Because PRIE uses a rulelist whose rules are ordered decreasing by class likelihood, the rulelist may be used naturally with the ROC convex hull (Provost and Fawcett, 2001). In use, if operating conditions (class skew and relative error costs) are known, the rulelist can be truncated to eliminate rules that will never affect a classification decision.

  • PRIE handles numerical attributes naturally, using the ROC curve implicitly to identify promising discretizations. Other classification models may discretize variables in a preprocessing pass or may use techniques unrelated to model construction. PRIE considers every discretization of a continuous attribute to comprise a separate point in ROC space, and handles these the same as any other discrete attribute.

  • PRIE can handle set-valued attributes (Cohen, 1996), in which an attribute of an instance may take on a set of discrete values instead of a single one. Such features are useful, for example, in text classification domains in which the set may represent the "bag of words" of a text document.

PRIE is unusual in that it uses basic principles from rule learning and computational geometry to focus the search for promising rule combinations. The result is a system that can learn rulelists with high AUC scores.

Changes to previous version:

Initial Announcement on mloss.org.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Platform Independent
Data Formats: Ascii, Arff
Tags: Classification, Roc, Rule Learning, Scoring
Archive: download here

Other available revisons

Version Changelog Date
1.5

1.5 Changed CF (confidence factor) to do LaPlace smoothing of estimates. New flag "--score-for-class C" causes scores to be computed relative to a given (positive) class. For two-class problems. Fixed bug in example sampling code (--sample n) Fixed bug keeping old-style example formats (terminated by dot) from working. More code restructuring.

October 27, 2010, 09:12:53
1.4

1.4 Many bug fixes. Made reader more robust. Complete rewrite of continuous attribute discretization code, fixing some persistent bugs. Restructured code so that when pyriel is installed via setup.py all modules end up under ./dist-packages/pyriel instead of scattered in dist-packages. Removed .py from installed scripts

October 17, 2010, 05:12:24
1.3

1.4 Many bug fixes. Made reader more robust. Complete rewrite of continuous attribute discretization code, fixing some persistent bugs. Restructured code so that when pyriel is installed via setup.py all modules end up under ./dist-packages/pyriel instead of scattered in dist-packages. Removed .py from installed scripts

August 29, 2010, 06:24:31
1.2

Fixed SetAttr methods in Attr.py that were keeping set attributes from working (thanks Adler Perotte). Added warning message to Read.py

April 18, 2010, 08:40:12
1.1

Fixed SetAttr methods in Attr.py that were keeping set attributes from working (thanks Adler Perotte). Added warning message to Read.py

April 7, 2010, 23:59:32
1.0

Initial Announcement on mloss.org.

March 14, 2010, 03:51:18

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.