Projects that are tagged with clustering.
Showing Items 1-20 of 45 on page 1 of 3: 1 2 3 Next

Logo Apache Mahout 0.11.1

by gsingers - November 9, 2015, 16:12:06 CET [ Project Homepage BibTeX Download ] 18833 views, 4952 downloads, 3 subscriptions

About: Apache Mahout is an Apache Software Foundation project with the goal of creating both a community of users and a scalable, Java-based framework consisting of many machine learning algorithm [...]


Apache Mahout introduces a new math environment we call Samsara, for its theme of universal renewal. It reflects a fundamental rethinking of how scalable machine learning algorithms are built and customized. Mahout-Samsara is here to help people create their own math while providing some off-the-shelf algorithm implementations. At its core are general linear algebra and statistical operations along with the data structures to support them. You can use is as a library or customize it in Scala with Mahout-specific extensions that look something like R. Mahout-Samsara comes with an interactive shell that runs distributed operations on a Spark cluster. This make prototyping or task submission much easier and allows users to customize algorithms with a whole new degree of freedom. Mahout Algorithms include many new implementations built for speed on Mahout-Samsara. They run on Spark 1.3+ and some on H2O, which means as much as a 10x speed increase. You’ll find robust matrix decomposition algorithms as well as a Naive Bayes classifier and collaborative filtering. The new spark-itemsimilarity enables the next generation of cooccurrence recommenders that can use entire user click streams and context in making recommendations.

Logo Cognitive Foundry 3.4.2

by Baz - October 30, 2015, 06:53:03 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 23214 views, 3899 downloads, 4 subscriptions

About: The Cognitive Foundry is a modular Java software library of machine learning components and algorithms designed for research and applications.

  • General:
    • Upgraded MTJ to 1.0.3.
  • Common:
    • Added package for hash function computation including Eva, FNV-1a, MD5, Murmur2, Prime, SHA1, SHA2
    • Added callback-based forEach implementations to Vector and InfiniteVector, which can be faster for iterating through some vector types.
    • Optimized DenseVector by removing a layer of indirection.
    • Added method to compute set of percentiles in UnivariateStatisticsUtil and fixed issue with percentile interpolation.
    • Added utility class for enumerating combinations.
    • Adjusted ScalarMap implementation hierarchy.
    • Added method for copying a map to VectorFactory and moved createVectorCapacity up from SparseVectorFactory.
    • Added method for creating square identity matrix to MatrixFactory.
    • Added Random implementation that uses a cached set of values.
  • Learning:
    • Implemented feature hashing.
    • Added factory for random forests.
    • Implemented uniform distribution over integer values.
    • Added Chi-squared similarity.
    • Added KL divergence.
    • Added general conditional probability distribution.
    • Added interfaces for Regression, UnivariateRegression, and MultivariateRegression.
    • Fixed null pointer exception that can happen in K-means with an empty cluster.
    • Fixed name of maxClusters property on AgglomerativeClusterer (was called maxMinDistance).
  • Text:
    • Improvements to LDA Gibbs sampler.

Logo JMLR dlib ml 18.18

by davis685 - October 29, 2015, 01:48:44 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 119856 views, 19970 downloads, 4 subscriptions

About: This project is a C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems.


This release has focused on build system improvements, both for the Python API and C++ builds using CMake. This includes adding a script for installing the dlib Python API as well as a make install target for installing a C++ shared library for non-Python use.

Logo MLweb 0.1.2

by lauerfab - October 9, 2015, 11:55:52 CET [ Project Homepage BibTeX Download ] 1453 views, 397 downloads, 3 subscriptions

About: MLweb is an open source project that aims at bringing machine learning capabilities into web pages and web applications, while maintaining all computations on the client side. It includes (i) a javascript library to enable scientific computing within web pages, (ii) a javascript library implementing machine learning algorithms for classification, regression, clustering and dimensionality reduction, (iii) a web application providing a matlab-like development environment.

  • Add Regression:AutoReg method
  • Add KernelRidgeRegression tuning function
  • More efficient predictions for KRR, SVM, SVR
  • Add BFGS optimization method
  • Faster QR, SVD and eigendecomposition
  • Better support for sparse vectors and matrices
  • Add linear algebra benchmark at
  • Fix plots in LALOlib/ML.js
  • Fix cross-origin issues in new MLlab()
  • Small bug fixes

Logo SALSA.jl 0.0.5

by jumutc - September 28, 2015, 17:28:56 CET [ Project Homepage BibTeX Download ] 530 views, 82 downloads, 1 subscription

About: SALSA (Software lab for Advanced machine Learning with Stochastic Algorithms) is an implementation of the well-known stochastic algorithms for Machine Learning developed in the high-level technical computing language Julia. The SALSA software package is designed to address challenges in sparse linear modelling, linear and non-linear Support Vector Machines applied to large data samples with user-centric and user-friendly emphasis.


Initial Announcement on

Logo ELKI 0.7.0-20150828

by erich - September 17, 2015, 10:20:30 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 15566 views, 2841 downloads, 4 subscriptions

About: ELKI is a framework for implementing data-mining algorithms with support for index structures, that includes a wide variety of clustering and outlier detection methods.


Additions and Improvements from ELKI 0.6.0:

  • Uncertain data types, and clustering algorithms for uncertain data.

  • Major refactoring of distances - removal of Distance values and removed support for non-double-valued distance functions. While this reduces the generality of ELKI, we could remove about 2.5% of the codebase by not having to have optimized codepaths for double-distance anymore. Generics for distances were present in almost any distance-based algorithm, and we were also happy to reduce the use of generics this way. Support for non-double-valued distances can trivially be added again, e.g. by adding the specialization one level higher: at the query instead of the distance level, for example.

  • In this process, we also removed the Generics from NumberVector. The object-based get was deprecated for a good reason long ago, and e.g. doubleValue are more efficient (even for non-DoubleVectors).

  • Dropped some long-deprecated classes

Clustering algorithms:


  • speedups for some initialization heuristics
  • K-means++ initialization no longer squares distances (again)
  • farthest-point heuristics now uses minimum instead of sum (renamed)
  • additional evaluation criteria
  • Elkan's and Hamerly's faster k-means variants

CLARA clustering


Hierarchical clustering

  • Renamed naive algorithm to AGNES
  • Anderbergs algorithm (faster than AGNES, slower than SLINK)
  • CLINK for complete linkage clustering in O(n²) time, O(n) memory
  • Simple extraction from HDBSCAN
  • "Optimal" extraction from HDBSCAN
  • HDBSCAN, in two variants

LSDBC clustering

EM clustering was refactored and moved into its own package. The new version is much more extensible.

Parallel computation framework, and some parallelized algorithms

  • Parallel k-means
  • Parallel LOF and variants


  • LibSVM format parser


  • kNN classification (with index acceleration)

Evaluation: Internal cluster evaluation:

  • Silhouette index
  • Simplified Silhouette index (faster)
  • Davis-Bouldin index
  • PBM index
  • Variance-Ratio-Criteria
  • Sum of squared errors
  • C-Index
  • Concordant pair indexes (Gamma, Tau)
  • Different noise handling strategies for internal indexes

Statistical dependence measures:

  • Distance correlation dCor.
  • Hoeffings D.
  • Some divergence / mutual information measures.

Distance functions:

  • Big refactoring.
  • Time series distances refactored, allow variable length series now.
  • Hellinger distance and kernel function.


  • Faster MDS implementation using power iterations.

Indexing improvements:

  • Precomputed distance matrix "index".
  • iDistance index (static only).
  • Inverted-list index for sparse data and cosine/arccosine distance.
  • cover tree index (static only).

Frequent Itemset Mining:

  • Improved APRIORI implementation.
  • FP-Growth added.
  • Eclat (basic version only) added.

Uncertain clustering:

  • Discrete and continuous data models
  • FDBSCAN clustering
  • UKMeans clustering
  • CKMeans clustering
  • Representative Uncertain Clustering (Meta-algorithm)
  • Center-of-mass meta Clustering (allows using other clustering algorithms on uncertain objects) (KDD'14)

Outlier detection changes / smaller improvements:

  • KDEOS outlier detection (SDM14)
  • k-means based outlier detection (distance to centroid) and Silhouette coefficient based approach (which does not work too well on the toy data sets - the lowest silhouette are usually where two clusters touch).
  • bug fix in kNN weight, when distances are tied and kNN yields more than k results.
  • kNN and kNN weight outlier have their k parameter changed: old 2NN outlier is now 1NN outlier, as commonly understood in classification literature (1 nearest neighbor ''other than the query object''; whereas in database literature the 1NN is usually the query object itself). You can get the old result back by decreasing k by one easily.
  • LOCI implementation is now only O(n^3 log n) instead of O(n^4).


  • MiniGUI has two "secret" new options: -minigui.last -minigui.autorun to load the last saved configuration and run it, for convenience.

  • Logging API has been extended, to make logging more convenient in a number of places (saving some lines for progress logging and timing).

Logo WEKA 3.7.13

by mhall - September 11, 2015, 04:55:02 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 51311 views, 7618 downloads, 4 subscriptions

Rating Whole StarWhole StarWhole StarWhole StarEmpty Star
(based on 6 votes)

About: The Weka workbench contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this [...]


In core weka:

  • Numerically stable implementation of variance calculation in core Weka classes - thanks to Benjamin Weber
  • Unified expression parsing framework (with compiled expressions) is now employed by filters and tools that use mathematical/logical expressions - thanks to Benjamin Weber
  • Developers can now specify GUI and command-line options for their Weka schemes via a new unified annotation-based mechanism
  • ClassConditionalProbabilities filter - replaces the value of a nominal attribute in a given instance with its probability given each of the possible class values
  • GUI package manager's available list now shows both packages that are not currently installed, and those installed packages for which there is a more recent version available that is compatible with the base version of Weka being used
  • ReplaceWithMissingValue filter - allows values to be randomly (with a user-specified probability) replaced with missing values. Useful for experimenting with methods for imputing missing values
  • WrapperSubsetEval can now use plugin evaluation metrics

In packages:

  • alternatingModelTrees package - alternating trees for regression
  • timeSeriesFilters package, contributed by Benjamin Weber
  • distributedWekaSpark package - wrapper for distributed Weka on Spark
  • wekaPython package - execution of CPython scripts and wrapper classifier/clusterer for Scikit Learn schemes
  • MLRClassifier in RPlugin now provides access to almost all classification and regression learners in MLR 2.4

Logo KeLP 1.2.1

by kelpadmin - July 24, 2015, 15:43:13 CET [ Project Homepage BibTeX Download ] 3815 views, 965 downloads, 3 subscriptions

About: Kernel-based Learning Platform (KeLP) is Java framework that supports the implementation of kernel-based learning algorithms, as well as an agile definition of kernel functions over generic data representation, e.g. vectorial data or discrete structures. The framework has been designed to decouple kernel functions and learning algorithms, through the definition of specific interfaces. Once a new kernel function has been implemented, it can be automatically adopted in all the available kernel-machine algorithms. KeLP includes different Online and Batch Learning algorithms for Classification, Regression and Clustering, as well as several Kernel functions, ranging from vector-based to structural kernels. It allows to build complex kernel machine based systems, leveraging on JSON/XML interfaces to instantiate classifiers without writing a single line of code.


The code for learning relations between pairs of short texts has been released, and includes the approach described in:

Simone Filice, Giovanni Da San Martino and Alessandro Moschitti. Relational Information for Learning from Structured Text Pairs. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015.

In particular this new release includes:

  • TreePairRelTagger: a manipulator that establishes relations between two tree representations (available in the maven project discreterepresentation)

  • 5 new kernels on pairs: released in the maven project standard-kernel

Check out this new version from our repositories. API Javadoc is already available. Your suggestions will be very precious for us, so download and try KeLP 1.2.1!

Logo LMW Tree 1.0

by cdevries - May 30, 2015, 11:42:23 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1209 views, 245 downloads, 2 subscriptions

About: Learning M-Way Tree - Web Scale Clustering - EM-tree, K-tree, k-means, TSVQ, repeated k-means, clustering, random projections, random indexing, hashing, bit signatures


Initial Announcement on

Logo ClusterEval 1.1

by cdevries - May 18, 2015, 22:01:01 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 3909 views, 1013 downloads, 2 subscriptions

About: Cluster quality Evaluation software. Implements cluster quality metrics based on ground truths such as Purity, Entropy, Negentropy, F1 and NMI. It includes a novel approach to correct for pathological or ineffective clusterings called 'Divergence from a Random Baseline'.


Moved project to GitHub.

Logo Auto encoder Based Data Clustering Toolkit 1.0

by openpr_nlpr - February 10, 2015, 08:30:55 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1553 views, 288 downloads, 2 subscriptions

About: The auto-encoder based data clustering toolkit provides a quick start of clustering based on deep auto-encoder nets. This toolkit can cluster data in feature space with a deep nonlinear nets.


Initial Announcement on

Logo Hub Miner 1.1

by nenadtomasev - January 22, 2015, 16:33:51 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2556 views, 528 downloads, 2 subscriptions

About: Hubness-aware Machine Learning for High-dimensional Data

  • BibTex support for all algorithm implementations, making all of them easy to reference (via algref package).

  • Two more hubness-aware approaches (meta-metric-learning and feature construction)

  • An implementation of Hit-Miss networks for analysis.

  • Several minor bug fixes.

  • The following instance selection methods were added: HMScore, Carving, Iterative Case Filtering, ENRBF.

  • The following clustering quality indexes were added: Folkes-Mallows, Calinski-Harabasz, PBM, G+, Tau, Point-Biserial, Hubert's statistic, McClain-Rao, C-root-k.

  • Some more experimental scripts have been included.

  • Extensions in the estimation of hubness risk.

  • Alias and weighted reservoir methods for weight-proportional random selection.

Logo APCluster 1.4.1

by UBod - December 10, 2014, 12:58:29 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 25869 views, 4605 downloads, 3 subscriptions

Rating Whole StarWhole StarWhole StarWhole Star1/2 Star
(based on 2 votes)

About: The apcluster package implements Frey's and Dueck's Affinity Propagation clustering in R. The package further provides leveraged affinity propagation, exemplar-based agglomerative clustering, and various tools for visual analysis of clustering results.

  • fixes in C++ code of sparse affinity propagation

Logo Accord.NET Framework 2.14.0

by cesarsouza - December 9, 2014, 23:04:04 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 23421 views, 4724 downloads, 2 subscriptions

About: The Accord.NET Framework is a .NET machine learning framework combined with audio and image processing libraries completely written in C#. It is a complete framework for building production-grade computer vision, computer audition, signal processing and statistics applications even for commercial use. A comprehensive set of sample applications provide a fast start to get up and running quickly, and an extensive online documentation helps fill in the details.


Adding a large number of new distributions, such as Anderson-Daring, Shapiro-Wilk, Inverse Chi-Square, Lévy, Folded Normal, Shifted Log-Logistic, Kumaraswamy, Trapezoidal, U-quadratic and BetaPrime distributions, Birnbaum-Saunders, Generalized Normal, Gumbel, Power Lognormal, Power Normal, Triangular, Tukey Lambda, Logistic, Hyperbolic Secant, Degenerate and General Continuous distributions.

Other additions include new statistical hypothesis tests such as Anderson-Daring and Shapiro-Wilk; as well as support for all of LIBLINEAR's support vector machine algorithms; and format reading support for MATLAB/Octave matrices, LibSVM models, sparse LibSVM data files, and many others.

For a complete list of changes, please see the full release notes at the release details page at:

Logo libAGF 0.9.8

by Petey - December 6, 2014, 02:35:39 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 12603 views, 2466 downloads, 2 subscriptions

About: C++ software for statistical classification, probability estimation and interpolation/non-linear regression using variable bandwidth kernel estimation.


New in Version 0.9.8:

  • bug fixes: svm file conversion works properly and is more general

  • non-hierarchical multi-borders has 3 options for solving for the conditional probabilities: matrix inversion, voting, and matrix inversion over-ridden by voting, with re-normalization

  • multi-borders now works with external binary classifiers

  • random numbers resolve a tie when selecting classes based on probabilities

  • pair of routines, sort_discrete_vectors and search_discrete_vectors, for classification based on n-d binning (still experimental)

  • command options have been changed with many new additions, see QUICKSTART file or run the relevant commands for details

Logo The Statistical ToolKit 0.8.4

by joblion - December 5, 2014, 13:21:47 CET [ Project Homepage BibTeX Download ] 1829 views, 585 downloads, 2 subscriptions

About: STK++: A Statistical Toolkit Framework in C++


Inegrating openmp to the current release. Many enhancement in the clustering project. bug fix

Logo libcluster 2.1

by dsteinberg - October 31, 2014, 23:27:57 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1654 views, 417 downloads, 2 subscriptions

About: An extensible C++ library of Hierarchical Bayesian clustering algorithms, such as Bayesian Gaussian mixture models, variational Dirichlet processes, Gaussian latent Dirichlet allocation and more.


Initial Announcement on

Logo pSpectralClustering 1.1

by tbuehler - July 30, 2014, 19:44:52 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 7046 views, 1606 downloads, 2 subscriptions

About: A generalized version of spectral clustering using the graph p-Laplacian.

  • fixed compatibility issue with Matlab R2013a+
  • several internal optimizations

Logo DRVQ 1.0.1-beta

by iavr - January 18, 2014, 17:26:34 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2083 views, 490 downloads, 1 subscription

About: DRVQ is a C++ library implementation of dimensionality-recursive vector quantization, a fast vector quantization method in high-dimensional Euclidean spaces under arbitrary data distributions. It is an approximation of k-means that is practically constant in data size and applies to arbitrarily high dimensions but can only scale to a few thousands of centroids. As a by-product of training, a tree structure performs either exact or approximate quantization on trained centroids, the latter being not very precise but extremely fast.


Initial Announcement on

Logo Malheur 0.5.4

by konrad - December 25, 2013, 13:20:31 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 16179 views, 3121 downloads, 1 subscription

About: Automatic Analysis of Malware Behavior using Machine Learning


Support for new version of libarchive. Minor bug fixes.

Showing Items 1-20 of 45 on page 1 of 3: 1 2 3 Next