-
- Description:
ELKI: "Environment for Developing KDD-Applications Supported by Index-Structures" is a development framework for data mining algorithms written in Java. It includes a large variety of popular data mining algorithms, distance functions and index structures.
Its focus is particularly on clustering and outlier detection methods, in contrast to many other data mining toolkits that focus on classification. Additionally, it includes support for index structures to improve algorithm performance such as R*-Tree and M-Tree.
The modular architecture is meant to allow adding custom components such as distance functions or algorithms, while being able to reuse the other parts for evaluation.
This package also includes the source code, since this software is meant for the rapid development of such algorithms, not so much for end users.
- Changes to previous version:
This is mostly a bug fix release. A lot of small issues have been fixed that improve performance, make error reporting a lot better, ease the use of sparse vectors and external precomputed distances, for example.
This will be the last ELKI release to support Java 6. The next ELKI release will require Java 7.
Algorithms
- Some new LOF variants (LDF, SimpleLOF, SimpleKernelDensityLOF)
- Correlation Outlier Probabilities (ICDM 2012)
- A naive mean-shift clustering
- Single-link clustering (SLINK algorithm) should be significantly faster due to optimized data structures
- "Benchmarking" algorithms for measuring the performance of index structures
Index layer
- Bulk loading R-Trees should be faster - in particular Sort Tile Recursive can work very well.
- M-Trees have been refactored and optimized for double distances
Database layer
- Bundle format (work in progress): low-level binary format for fast data exchange
- DBID and DataStore layer received some additional classes for further performance improvements
- KNN heap structures were revisited. The code is less clean now, but performs better in benchmarks.
Visualizations
- General clean up and API simplifications
- Some additional modules and improvements
Various
- There is a new parameter class, RandomParameter
- Some new distributions were added, also to the data set generator.
Tutorials
- The website has new tutorials, including one on a k-means variation that produces equal sized clusters.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Platform Independent
- Data Formats: Arff, Other, Csv, Parser Extension Api
- Tags: Clustering, Visualization, Algorithms, Evaluation, Anomaly Detection, Outlier Detection, Index Structures
- Archive: download here
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.