Project details for libAGF

Screenshot libAGF 0.9.4

by Petey - November 27, 2011, 06:25:56 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 3 comments, 0 subscriptions

Description:

This software is written in C++ and contains routines for statistical classification, probability estimation and interpolation/non-linear regression. Two variable bandwidth kernel methods are adopted: k-nearest neighbour (KNN), and a balloon estimator based on Gaussian kernels, hence Adaptive Gaussion Filtering (AGF). A library of easy-to-use, single-call functions (you call a single function once for each estimate--no initialization required) are included, as well as command-line executables.

The statistical classification routines are particularly powerful, allowing you to generate a pre-trained model by searching for the class borders. These can then be used to make rapid classifications which nonetheless return estimates of the conditional probabilities.

Clustering routines are a recent addition.

Changes to previous version:

New in version 0.92:

  • In the direct classification routines (classify_a, classify_knn), there is now an option (-j) to print out joint probabilities instead of conditional probabilities. Of course this can be done by calculating the total probability and multiplying by the conditional probability, but this means redundant calculation.

  • In class_borders, added the option (-r) to solve for a class border other than at R=0. This is useful if your classes are of significantly different size, especially when the training data does not reflect this.

  • There is now a simple clustering analysis program (cluster_knn) based on a threshold density. It works by first finding a point in which the density is greater than this threshold. Using the k-nearest neighbours to this point, it recursively finds all other points above this threshold and assigns them the same class number.

  • The option to use a metric other than Cartesian now exists. Since many of the calculations are specifically based on a Cartesian space, especially the PDF estimation, this should be applied with some caution.

  • Option for different names for files containing normalization data. It's a pretty minor point, so it's only been implemented in two or three programs, chiefly the class_borders and classify_b modules. I'm too lazy to do them all...

  • Added an n-fold cross-validation program that works with all the classification algorithms.

  • Added a small utility that just normalizes the data and thats it. Also cleaned up and properly renamed a utility (vecfile2lvq) to convert the binary files to Kohonen's LVQ format.

New in Version 0.9.3:

  • The libpetey library is no longer part of the libagf distribution

  • The class borders codes can no longer generate duplicate samples. There are two versions: one for large training datasets, and on for small. If all combinations of pairs of training samples have been used up, the codes will generate no more training samples.

New in Version 0.9.4:

  • Most importantly, everything, except the IO routines, has been templated. This means you can do your work in single or double precision and you can represent your classes as bytes, 8-bit integers, 16-bit integers, 32 bit integers, etc. -- whatever size you want.

  • With the exception of those used in external routines, variable types in the main routines are now controlled with global typedefs, with each class of variable having a different type. This means you can tightly control the typing for optimal use of space or CPU cycles. Classes have a default type of 32-bit integers while floating point operations are done in single precision by default.

  • Different metrics are now only supported in the routines where they make sense: KNN classification and KNN interpolation. The functions now require a pointer to the desired metric.

  • nfold routine now supports interpolation. Note that this is still not well test (if at all).

  • File conversion utilities as well as the test class routines have now been integrated into the main distribution simply by more linking the two makefiles more closely, thus allowing easier testing and more user-friendly files.

  • A routine that performs AGF PDF estimation with an optimal error rate is currently being tested but is not ready yet. We hope to have it ready in a new release very shortly.

  • Also in the next release: multi-class classification using the class-borders method. Stay tuned!

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Agnostic
Data Formats: Ascii, Binary
Tags: Clustering, Nonparametric Density Estimation, Supervised Learning, Interpolation, Inverse Methods, Kernel Estimation, Nonlinear Regression, Probability Estimation, Statistical Classification
Archive: download here

Comments

Peter Mills (on March 15, 2012, 05:04:16)
I had hoped to have multi-class border-classification ready by now, but the simple generalization I had envisioned to implement it won't work in all cases. The idea was to use matrix inversion to solve for the conditional probabilities, but quite obviously (in retrospect) you can solve for the class without being able to determine all the conditional probabilities. Likely we need two cases: one where all the conditional probabilities can be found, and one where only that of the retrieved class can be found and these two cases need to interoperate. A recursive or hierarchical model would seem to be the best solution here. I realize that there is literature relating to the problem of creating multi-class classifications from two-class, however I do not currently have access to commercial journals as I am not affiliated with an academic or research institution. It is also an enjoyable challenge to try and figure these things out for yourself, from scratch, so to speak. Likewise I had hoped to have the optimal-bandwidth Gaussian PDF estimation ready. I had made some progress on it, but the test cases were not giving consistent results and I have failed to work on it in the intervening months.
Peter Mills (on April 15, 2014, 04:55:05)
Multi-borders classification is now ready. I am very pleased (and pleasantly surprised) with how well it works.
Peter Mills (on January 23, 2016, 23:46:19)
The libAGF library has been combined with two other libraries and moved to Github under the project, libmsci: https://github.com/peteysoft/libmsci

Leave a comment

You must be logged in to post comments.