Projects that are tagged with evaluation.


Logo ELKI 0.6.0

by erich - January 10, 2014, 18:32:28 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 10748 views, 1933 downloads, 3 subscriptions

About: ELKI is a framework for implementing data-mining algorithms with support for index structures, that includes a wide variety of clustering and outlier detection methods.

Changes:

Additions and Improvements from ELKI 0.5.5:

Algorithms

Clustering:

  • Hierarchical Clustering - the slower naive variants were added, and the code was refactored
  • Partition extraction from hierarchical clusterings - different linkage strategies (e.g. Ward)
  • Canopy pre-Clustering
  • Naive Mean-Shift Clustering
  • Affinity propagation clustering (both with distances and similarities / kernel functions)
  • K-means variations: Best-of-multiple-runs, bisecting k-means
  • New k-means initialization: farthest points, sample initialization
  • Cheng and Church Biclustering
  • P3C Subspace Clustering
  • One-dimensional clustering algorithm based on kernel density estimation

Outlier detection

  • COP - correlation outlier probabilities
  • LDF - a kernel density based LOF variant
  • Simplified LOF - a simpler version of LOF (not using reachability distance)
  • Simple Kernel Density LOF - a simple LOF using kernel density (more consistent than LDF)
  • Simple outlier ensemble algorithm
  • PINN - projection indexed nearest neighbors, via projected indexes.
  • ODIN - kNN graph based outlier detection
  • DWOF - Dynamic-Window Outlier Factor (contributed by Omar Yousry)
  • ABOD refactored, into ABOD, FastABOD and LBABOD

Distances

  • Geodetic distances now support different world models (WGS84 etc.) and are subtantially faster.
  • Levenshtein distances for processing strings, e.g. for analyzing phonemes (contributed code, see "Word segmentation through cross-lingual word-to-phoneme alignment", SLT2013, Stahlberg et al.)
  • Bray-Curtis, Clark, Kulczynski1 and Lorentzian distances with R-tree indexing support
  • Histogram matching distances
  • Probabilistic divergence distances (Jeffrey, Jensen-Shannon, Chi2, Kullback-Leibler)
  • Kulczynski2 similarity
  • Kernel similarity code has been refactored, and additional kernel functions have been added

Database Layer and Data Types

Projection layer * Parser for simple textual data (for use with Levenshtein distance) Various random projection families (including Feature Bagging, Achlioptas, and p-stable) Latitude+Longitude to ECEF Sparse vector improvements and bug fixes New filter: remove NaN values and missing values New filter: add histogram-based jitter New filter: normalize using statistical distributions New filter: robust standardization using Median and MAD New filter: Linear discriminant analysis (LDA)

Index Layer

  • Another speed up in R-trees
  • Refactoring of M- and R-trees: Support for different strategies in M-tree New strategies for M-tree splits Speedups in M-tree
  • New index structure: in-memory k-d-tree
  • New index structure: in-memory Locality Sensitive Hashing (LSH)
  • New index structure: approximate projected indexes, such as PINN
  • Index support for geodetic data - (Details: Geodetic Distance Queries on R-Trees for Indexing Geographic Data, SSTD13)
  • Sampled k nearest neighbors: reference KDD13 "Subsampling for Efficient and Effective Unsupervised Outlier Detection Ensembles"
  • Cached (precomputed) k-nearest neighbors to share across multiple runs
  • Benchmarking "algorithms" for indexes

Mathematics and Statistics

  • Many new distributions have been added, now 28 different distributions are supported
  • Additional estimation methods (using advanced statistics such as L-Moments), now 44 estimators are available
  • Trimming and Winsorizing
  • Automatic best-fit distribution estimation
  • Preprocessor using these distributions for rescaling data sets
  • API changes related to the new distributions support
  • More kernel density functions
  • RANSAC covariance matrix builder (unfortunately rather slow)

Visualization

  • 3D projected coordinates (Details: Interactive Data Mining with 3D-Parallel-Coordinate-Trees, SIGMOD2013)
  • Convex hulls now also include nested hierarchical clusters

Other

  • Parser speedups
  • Sparse vector bug fixes and improvements
  • Various bug fixes
  • PCA, MDS and LDA filters
  • Text output was slightly improved (but still needs to be redesigned from scratch - please contribute!)
  • Refactoring of hierarchy classes
  • New heap classes and infrastructure enhancements
  • Classes can have aliases, e.g. "l2" for euclidean distance.
  • Some error messages were made more informative.
  • Benchmarking classes, also for approximate nearest neighbor search.

Logo MyMediaLite 3.10

by zenog - October 8, 2013, 22:29:29 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 44368 views, 8398 downloads, 1 subscription

About: MyMediaLite is a lightweight, multi-purpose library of recommender system algorithms.

Changes:

Mostly bug fixes.

For details see: https://github.com/zenogantner/MyMediaLite/blob/master/doc/Changes


Logo ClusterEval 1.0

by cdevries - June 16, 2013, 04:15:30 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1628 views, 514 downloads, 1 subscription

About: Cluster quality Evaluation software. Implements cluster quality metrics based on ground truths such as Purity, Entropy, Negentropy, F1 and NMI. It includes a novel approach to correct for pathological or ineffective clusterings called 'Divergence from a Random Baseline'.

Changes:

Initial Announcement on mloss.org.


Logo ROC algorithms 1.0

by tfawcett - January 9, 2010, 19:52:00 CET [ BibTeX BibTeX for corresponding Paper Download ] 3975 views, 750 downloads, 1 subscription

About: A set of Perl programs for generating and manipulating ROC curves.

Changes:

Initial Announcement on mloss.org.


Logo ROCCH 2.2

by tfawcett - January 9, 2010, 07:47:47 CET [ BibTeX BibTeX for corresponding Paper Download ] 2566 views, 700 downloads, 0 comments, 1 subscription

About: Given many points in ROC (Receiver Operator Characteristics) space, computes the convex hull.

Changes:

Initial Announcement on mloss.org.


Logo JMLR Model Monitor 1.0

by traeder - August 17, 2009, 11:05:06 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 12603 views, 1745 downloads, 0 comments, 1 subscription

About: Model Monitor is a Java toolkit for the systematic evaluation of classifiers under changes in distribution. It provides methods for detecting distribution shifts in data, comparing the performance [...]

Changes:

Improved AUROC calculation. Several minor bug fixes.