-
- Description:
This software is written in C++ and contains routines for statistical classification, probability estimation and interpolation/non-linear regression. Two variable bandwidth kernel methods are adopted: k-nearest neighbour (KNN), and a balloon estimator based on Gaussian kernels, hence Adaptive Gaussion Filtering (AGF). A library of easy-to-use, single-call functions (you call a single function once for each estimate--no initialization required) are included, as well as command-line executables.
The statistical classification routines are particularly powerful, allowing you to generate a pre-trained model by searching for the class borders. These can then be used to make rapid classifications which nonetheless return estimates of the conditional probabilities.
Clustering routines are a recent addition.
- Changes to previous version:
New in Version 0.9.5:
Sadly, neither the multi-class classifier using the "borders" method, nor the optimal AGF routine have been perfected yet. However, there are quite a few other good improvements to sweeten the mix...
The routine for finding the k-nearest-neighbours has been changed from one based on a binary tree to one based on a quicksort algorithm. Speed improvements are expected to be on the order of 25%. To change back to the old version, use the macro, KLEAST_FUNC, in the agf_defs.h include file.
The routine for calculating the weights for the AGF algorithm now matches the filter variance to the W parameter using the supernewton root-finding algorithm instead of by squaring the initial weights. This means that there are now two bounds for the filter variance. They are set by the -v and -V options for the lower and upper bounds respectively. Since it is trivial to push the bounds outward if they do not bracket the root and since these changes are "sticky" it does not matter if the high bound is too low or the low bound too high. Rather the user should try to avoid the opposite extreme as this will mean a larger number of iterations to reach the root. Default bounds are [sigma^2/n^(2/D), sigma^2] where sigma^2 is the total variance of the data.
The new weight-calculating routine is more accurate and should be more robust as well, although at the cost of a slight speed penalty. As with the kleast subroutine, however, the old version can be re-instated by changing the AGF_CALC_W_FUNC macro. The intial filter variance, since it is an upper bound, is now set with the -V option instead of the -v option.
For maximum control of the weight-calculating routine, several new options have been added. To change the maximum number of iterations in the supernewton root-finding algorithm, use the -I option. This changes it for both calculation of weights and for searching for the class borders. To change it for one or the other, use -i for the weight calculation routine and -h for the class borders routine. The default number of iterations for both is 100 which may not be sufficient for some problems.
To change the tolerance of W, or the total of the weights, use the -l option. Default is 0.005 which should be more than sufficient. Since the accuracy of W is not that critical, the tolerance can be degraded, probably as high as 1, for a slight speed savings.
The parameter W is now set with the -W option (uppercase double-u) instead of the -w option (lowercase double-u).
The optimal AGF may not work yet, but it's a lot more user friendly! Check the documentation...
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Agnostic
- Data Formats: Ascii, Binary
- Tags: Clustering, Nonparametric Density Estimation, Supervised Learning, Interpolation, Inverse Methods, Kernel Estimation, Nonlinear Regression, Probability Estimation, Statistical Classification
- Archive: download here
Comments
-
- Peter Mills (on March 15, 2012, 05:04:16)
- I had hoped to have multi-class border-classification ready by now, but the simple generalization I had envisioned to implement it won't work in all cases. The idea was to use matrix inversion to solve for the conditional probabilities, but quite obviously (in retrospect) you can solve for the class without being able to determine all the conditional probabilities. Likely we need two cases: one where all the conditional probabilities can be found, and one where only that of the retrieved class can be found and these two cases need to interoperate. A recursive or hierarchical model would seem to be the best solution here. I realize that there is literature relating to the problem of creating multi-class classifications from two-class, however I do not currently have access to commercial journals as I am not affiliated with an academic or research institution. It is also an enjoyable challenge to try and figure these things out for yourself, from scratch, so to speak. Likewise I had hoped to have the optimal-bandwidth Gaussian PDF estimation ready. I had made some progress on it, but the test cases were not giving consistent results and I have failed to work on it in the intervening months.
-
- Peter Mills (on April 15, 2014, 04:55:05)
- Multi-borders classification is now ready. I am very pleased (and pleasantly surprised) with how well it works.
-
- Peter Mills (on January 23, 2016, 23:46:19)
- The libAGF library has been combined with two other libraries and moved to Github under the project, libmsci: https://github.com/peteysoft/libmsci
Leave a comment
You must be logged in to post comments.