Project details for ELKI

Logo ELKI 0.5.5

by erich - December 14, 2012, 18:49:58 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (8 today), download ( 0 today ), 2 subscriptions

Description:

ELKI: "Environment for Developing KDD-Applications Supported by Index-Structures" is a development framework for data mining algorithms written in Java. It includes a large variety of popular data mining algorithms, distance functions and index structures.

Its focus is particularly on clustering and outlier detection methods, in contrast to many other data mining toolkits that focus on classification. Additionally, it includes support for index structures to improve algorithm performance such as R*-Tree and M-Tree.

The modular architecture is meant to allow adding custom components such as distance functions or algorithms, while being able to reuse the other parts for evaluation.

This package also includes the source code, since this software is meant for the rapid development of such algorithms, not so much for end users.

Changes to previous version:

This is mostly a bug fix release. A lot of small issues have been fixed that improve performance, make error reporting a lot better, ease the use of sparse vectors and external precomputed distances, for example.

This will be the last ELKI release to support Java 6. The next ELKI release will require Java 7.

Algorithms

  • Some new LOF variants (LDF, SimpleLOF, SimpleKernelDensityLOF)
  • Correlation Outlier Probabilities (ICDM 2012)
  • A naive mean-shift clustering
  • Single-link clustering (SLINK algorithm) should be significantly faster due to optimized data structures
  • "Benchmarking" algorithms for measuring the performance of index structures

Index layer

  • Bulk loading R-Trees should be faster - in particular Sort Tile Recursive can work very well.
  • M-Trees have been refactored and optimized for double distances

Database layer

  • Bundle format (work in progress): low-level binary format for fast data exchange
  • DBID and DataStore layer received some additional classes for further performance improvements
  • KNN heap structures were revisited. The code is less clean now, but performs better in benchmarks.

Visualizations

  • General clean up and API simplifications
  • Some additional modules and improvements

Various

  • There is a new parameter class, RandomParameter
  • Some new distributions were added, also to the data set generator.

Tutorials

  • The website has new tutorials, including one on a k-means variation that produces equal sized clusters.
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
URL: Project Homepage
Supported Operating Systems: Platform Independent
Data Formats: Arff, Other, Csv, Parser Extension Api
Tags: Clustering, Visualization, Algorithms, Evaluation, Anomaly Detection, Outlier Detection, Index Structures
Archive: download here

Other available revisons

Version Changelog Date
0.5.5

This is mostly a bug fix release. A lot of small issues have been fixed that improve performance, make error reporting a lot better, ease the use of sparse vectors and external precomputed distances, for example.

This will be the last ELKI release to support Java 6. The next ELKI release will require Java 7.

Algorithms

  • Some new LOF variants (LDF, SimpleLOF, SimpleKernelDensityLOF)
  • Correlation Outlier Probabilities (ICDM 2012)
  • A naive mean-shift clustering
  • Single-link clustering (SLINK algorithm) should be significantly faster due to optimized data structures
  • "Benchmarking" algorithms for measuring the performance of index structures

Index layer

  • Bulk loading R-Trees should be faster - in particular Sort Tile Recursive can work very well.
  • M-Trees have been refactored and optimized for double distances

Database layer

  • Bundle format (work in progress): low-level binary format for fast data exchange
  • DBID and DataStore layer received some additional classes for further performance improvements
  • KNN heap structures were revisited. The code is less clean now, but performs better in benchmarks.

Visualizations

  • General clean up and API simplifications
  • Some additional modules and improvements

Various

  • There is a new parameter class, RandomParameter
  • Some new distributions were added, also to the data set generator.

Tutorials

  • The website has new tutorials, including one on a k-means variation that produces equal sized clusters.
December 14, 2012, 18:49:58
0.5.0

Primary release goals:

  • Cluster evaluation: metrics and circle-segment-visualization (ICDE 2012)

  • Outlier detection ensembles (SDM 2011, 2012)

  • Usability improvements, for example by adding an automatic evaluation helper

  • Performance improvements by reducing boxing of primitive types

  • Parallel coordinates visualizations added for high-dimensional data

  • Tons of new algorithms, distance functions, index structures, visualizations, evaluators, ...

http://elki.dbs.ifi.lmu.de/wiki/Releases/ReleaseNotes0.5.0

July 1, 2012, 20:58:25
0.5.0 beta2

The full changelog is not yet up. Here is an excerpt of the new functions in 0.5.0 - further speed improvements - R-Tree flexibility: multiple new split strategies, bulk loaders, insertion strategies, so that ELKI can now do many R-Tree variations, including the original Guttman R-Tree, not only the R*-Tree. - K-Means flexibility: MacQueen and Lloyd style iterations along with various seeding strategies, including K-Means++ - VA-File (static only, not dynamic databases) - Many popular cluster evaluation measures - Alpha shapes, Voronoi cells, Delaunay triangulations in the visualization layer (in the projected space, so 2D!) - Parallel coordinates - Outlier ensemble code, presented at SDM 2012 - Some new algorithms, such as OUTRES

For the final 0.5.0 release we hope to have some approximate outlier detection methods for you (aLOCI, HilOut) as well as some subspace outlier detection methods including HiCS (ICDE 2012, to be presented tomorrow).

June 1, 2012, 21:32:08
0.5.0 beta1

The full changelog is not yet up. Here is an excerpt of the new functions in 0.5.0 - further speed improvements - R-Tree flexibility: multiple new split strategies, bulk loaders, insertion strategies, so that ELKI can now do many R-Tree variations, including the original Guttman R-Tree, not only the R*-Tree. - K-Means flexibility: MacQueen and Lloyd style iterations along with various seeding strategies, including K-Means++ - VA-File (static only, not dynamic databases); partial-VA to come for 0.5.0 final? - Many popular cluster evaluation measures - Alpha shapes, Voronoi cells, Delaunay triangulations in the visualization layer (in the projected space, so 2D!) - Parallel coordinates (only halfway reviewed in beta1, more to come!) - Outlier ensemble code, to be presented at SDM 2012 end of april

For the final 0.5.0 release we hope to have some approximate outlier detection methods for you (aLOCI, HilOut) as well as some subspace outlier detection methods including HiCS (ICDE 2012, to be presented tomorrow).

May 9, 2012, 20:46:08
0.4.1

Bug fix release with a number of minor issues affecting single algorithms, that have accumulated over the previous months. Existing applications should not be affected by this upgrade.

A larger 0.5.0 release is scheduled for early april with new algorithms, but also with API changes.

February 13, 2012, 16:51:35
0.4.0

Initial Announcement on mloss.org.

January 16, 2012, 22:12:23

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.