Projects supporting the other data format.

Logo ELKI 0.7.0

by erich - November 27, 2015, 18:23:16 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 15669 views, 2853 downloads, 4 subscriptions

About: ELKI is a framework for implementing data-mining algorithms with support for index structures, that includes a wide variety of clustering and outlier detection methods.


Additions and Improvements from ELKI 0.6.0:

ELKI is now available on Maven:|de.lmu.ifi.dbs.elki|elki|0.7.0|jar

Please clone for a minimal project example.

Uncertain data types, and clustering algorithms for uncertain data.

Major refactoring of distances - removal of Distance values and removed support for non-double-valued distance functions (in particular DoubleDistance was removed). While this reduces the generality of ELKI, we could remove about 2.5% of the codebase by not having to have optimized codepaths for double-distance anymore. Generics for distances were present in almost any distance-based algorithm, and we were also happy to reduce the use of generics this way. Support for non-double-valued distances can trivially be added again, e.g. by adding the specialization one level higher: at the query instead of the distance level, for example. In this process, we also removed the Generics from NumberVector. The object-based get was deprecated for a good reason long ago, and e.g. doubleValue are more efficient (even for non-DoubleVectors).

Dropped some long-deprecated classes.


  • speedups for some initialization heuristics.

  • K-means++ initialization no longer squares distances (again).

  • farthest-point heuristics now uses minimum instead of sum (renamed).

  • additional evaluation criteria.

  • Elkan's and Hamerly's faster k-means variants.

CLARA clustering.


Hierarchical clustering:

  • Renamed naive algorithm to AGNES.

  • Anderbergs algorithm (faster than AGNES, slower than SLINK).

  • CLINK for complete linkage clustering in O(n²) time, O(n) memory.

  • Simple extraction from HDBSCAN.

  • "Optimal" extraction from HDBSCAN.

  • HDBSCAN, in two variants.

LSDBC clustering.

EM clustering was refactored and moved into its own package. The new version is much more extensible.

OPTICS clustering:

  • Added a list-based variant of OPTICS to our heap-based.

  • FastOPTICS (contributed by Johannes Schneider).

  • Improved OPTICS Xi cluster extraction.

Outlier detection:

  • KDEOS outlier detection (SDM14).

  • k-means based outlier detection (distance to centroid) and Silhouette coefficient based approach (which does not work too well on the toy data sets - the lowest silhouette are usually where two clusters touch).

  • bug fix in kNN weight, when distances are tied and kNN yields more than k results.

  • kNN and kNN weight outlier have their k parameter changed: old 2NN outlier is now 1NN outlier, as commonly understood in classification literature (1 nearest neighbor other than the query object; whereas in database literature the 1NN is usually the query object itself). You can get the old result back by decreasing k by one easily.

  • LOCI implementation is now only O(n^3 log n) instead of O(n^4).

  • Local Isolation Coefficient (LIC).

  • IDOS outlier detection with intrinsic dimensionality.

  • Baseline intrinsic dimensionality outlier detection.

  • Variance-of-Volumes outlier detection (VOV).

Parallel computation framework, and some parallelized algorithms

  • Parallel k-means.

  • Parallel LOF and variants.

LibSVM format parser.

kNN classification (with index acceleration).

Internal cluster evaluation:

  • Silhouette index.

  • Simplified Silhouette index (faster).

  • Davis-Bouldin index.

  • PBM index.

  • Variance-Ratio-Criteria.

  • Sum of squared errors.

  • C-Index.

  • Concordant pair indexes (Gamma, Tau).

  • Different noise handling strategies for internal indexes.

Statistical dependence measures:

  • Distance correlation dCor.

  • Hoeffings D.

  • Some divergence / mutual information measures.

Distance functions:

  • Big refactoring.

  • Time series distances refactored, allow variable length series now.

  • Hellinger distance and kernel function.


  • Faster MDS implementation using power iterations.

Indexing improvements:

  • Precomputed distance matrix "index".

  • iDistance index (static only).

  • Inverted-list index for sparse data and cosine/arccosine distance.

  • Cover tree index (static only).

  • Additional LSH hash functions.

Frequent Itemset Mining:

  • Improved APRIORI implementation.

  • FP-Growth added.

  • Eclat (basic version only) added.

Uncertain clustering:

  • Discrete and continuous data models.

  • FDBSCAN clustering.

  • UKMeans clustering.

  • CKMeans clustering.

  • Representative Uncertain Clustering (Meta-algorithm).

  • Center-of-mass meta Clustering (allows using other clustering algorithms on uncertain objects).


  • Several estimators for intrinsic dimensionality.

MiniGUI has two "secret" new options: -minigui.last -minigui.autorun to load the last saved configuration and run it, for convenience.

Logging API has been extended, to make logging more convenient in a number of places (saving some lines for progress logging and timing).

Logo lomo feature extraction and xqda metric learning for person reidentification 1.0

by openpr_nlpr - May 6, 2015, 11:38:32 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1394 views, 223 downloads, 3 subscriptions

Rating Empty StarEmpty StarEmpty StarEmpty StarEmpty Star
(based on 1 vote)

About: This MATLAB package provides the LOMO feature extraction and the XQDA metric learning algorithms proposed in our CVPR 2015 paper. It is fast, and effective for person re-identification. For more details, please visit


Initial Announcement on

Logo factorie 1.0.0-M7

by apassos - October 7, 2013, 23:10:37 CET [ Project Homepage BibTeX Download ] 2568 views, 554 downloads, 1 subscription

About: [FACTORIE]( is a toolkit for deployable probabilistic modeling, implemented as a software library in [Scala]( It provides its users with a succinct language for creating [factor graphs](, estimating parameters and performing inference. It also has implementations of many machine learning tools and a full NLP pipeline.


Initial Announcement on

Logo MROGH 1.0

by openpr_nlpr - October 16, 2012, 04:41:51 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2886 views, 629 downloads, 1 subscription

About: An implementation of MROGH descriptor. For more information, please refer to: “Bin Fan, Fuchao Wu and Zhanyi Hu, Aggregating Gradient Distributions into Intensity Orders: A Novel Local Image Descriptor, CVPR 2011, pp.2377-2384.” The most up-to-date information can be found at :


Initial Announcement on

Logo Isoline Retrieval SVN rev. 7

by Petey - February 21, 2012, 16:56:09 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 3165 views, 665 downloads, 1 subscription

About: Software to perform isoline retrieval, retrieve isolines of an atmospheric parameter from a nadir-looking satellite.


Added screenshot, keywords

Logo Gird Soccer Simulator 1.0

by sina_iravanian - April 27, 2011, 16:47:38 CET [ Project Homepage BibTeX Download ] 3443 views, 1007 downloads, 1 subscription

About: Grid-Soccer Simulator is a multi-agent soccer simulator in a grid-world environment. The environment provides a test-bed for machine-learning, and control algorithms, especially multi-agent reinforcement learning.


Initial Announcement on

Logo svmPRAT 1.0

by rangwala - December 28, 2009, 00:27:03 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 4631 views, 1196 downloads, 1 subscription

About: BACKGROUND:Over the last decade several prediction methods have been developed for determining the structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. RESULTS:We present a general purpose protein residue annotation toolkit (svmPRAT) to allow biologists to formulate residue-wise prediction problems. svmPRAT formulates the annotation problem as a classification or regression problem using support vector machines. One of the key features of svmPRAT is its ease of use in incorporating any user-provided information in the form of feature matrices. For every residue svmPRAT captures local information around the reside to create fixed length feature vectors. svmPRAT implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that accurately captures signals and pattern for training eective predictive models. CONCLUSIONS:In this work we evaluate svmPRAT on several classification and regression problems including disorder prediction, residue-wise contact order estimation, DNA-binding site prediction, and local structure alphabet prediction. svmPRAT has also been used for the development of state-of-the-art transmembrane helix prediction method called TOPTMH, and secondary structure prediction method called YASSPP. This toolkit developed provides practitioners an efficient and easy-to-use tool for a wide variety of annotation problems. Availability:


Initial Announcement on