Projects that are tagged with clustering.
Showing Items 1-20 of 41 on page 1 of 3: 1 2 3 Next

Logo Cognitive Foundry 3.4.0

by Baz - April 3, 2015, 08:28:14 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 18686 views, 3039 downloads, 2 subscriptions

About: The Cognitive Foundry is a modular Java software library of machine learning components and algorithms designed for research and applications.

Changes:
  • General:
    • Now requires Java 1.7 or higher.
    • Improved compatibility with Java 1.8 functions by removing ClonableSerializable requirement from many function-style interfaces.
  • Common Core:
    • Improved iteration speed over sparse MTJ vectors.
    • Added utility methods for more stable log(1+x), exp(1-x), log(1 - exp(x)), and log(1 + exp(x)) to LogMath.
    • Added method for creating a partial permutations to Permutation.
    • Added methods for computing standard deviation to UnivariateStatisticsUtil.
    • Added increment, decrement, and list view methods to Vector and Matrix.
    • Added shorter versions of get and set for Vector and Matrix getElement and setElement.
    • Added aliases of dot for dotProduct in VectorSpace.
    • Added utility methods for divideByNorm2 to VectorUtil.
  • Learning:
    • Added a learner for a Factorization Machine using SGD.
    • Added a iterative reporter for validation set performance.
    • Added new methods to statistical distribution classes to allow for faster sampling without boxing, in batches, or without creating extra memory.
    • Made generics for performance evaluators more permissive.
    • ParameterGradientEvaluator changed to not require input, output, and gradient types to be the same. This allows more sane gradient definitions for scalar functions.
    • Added parameter to enforce a minimum size in a leaf node for decision tree learning. It is configured through the splitting function.
    • Added ability to filter which dimensions to use in the random subspace and variance tree node splitter.
    • Added ReLU, leaky ReLU, and soft plus activation functions for neural networks.
    • Added IntegerDistribution interface for distributions over natural numbers.
    • Added a method to get the mean of a numeric distribution without boxing.
    • Fixed an issue in DefaultDataDistribution that caused the total to be off when a value was set to less than or equal to 0.
    • Added property for rate to GammaDistribution.
    • Added method to get standard deviation from a UnivariateGaussian.
    • Added clone operations for decision tree classes.
    • Fixed issue TukeyKramerConfidence interval computation.
    • Fixed serialization issue with SMO output.

Logo JMLR dlib ml 18.14

by davis685 - March 1, 2015, 23:51:06 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 96746 views, 16767 downloads, 3 subscriptions

About: This project is a C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems.

Changes:

This release adds an implementation of spectral clustering as well as a few bug fixes and usability improvements.


Logo Auto encoder Based Data Clustering Toolkit 1.0

by openpr_nlpr - February 10, 2015, 08:30:55 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 623 views, 122 downloads, 2 subscriptions

About: The auto-encoder based data clustering toolkit provides a quick start of clustering based on deep auto-encoder nets. This toolkit can cluster data in feature space with a deep nonlinear nets.

Changes:

Initial Announcement on mloss.org.


Logo Hub Miner 1.1

by nenadtomasev - January 22, 2015, 16:33:51 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1445 views, 245 downloads, 2 subscriptions

About: Hubness-aware Machine Learning for High-dimensional Data

Changes:
  • BibTex support for all algorithm implementations, making all of them easy to reference (via algref package).

  • Two more hubness-aware approaches (meta-metric-learning and feature construction)

  • An implementation of Hit-Miss networks for analysis.

  • Several minor bug fixes.

  • The following instance selection methods were added: HMScore, Carving, Iterative Case Filtering, ENRBF.

  • The following clustering quality indexes were added: Folkes-Mallows, Calinski-Harabasz, PBM, G+, Tau, Point-Biserial, Hubert's statistic, McClain-Rao, C-root-k.

  • Some more experimental scripts have been included.

  • Extensions in the estimation of hubness risk.

  • Alias and weighted reservoir methods for weight-proportional random selection.


Logo WEKA 3.7.12

by mhall - December 17, 2014, 03:04:17 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 44783 views, 6659 downloads, 3 subscriptions

Rating Whole StarWhole StarWhole StarWhole StarEmpty Star
(based on 6 votes)

About: The Weka workbench contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this [...]

Changes:

In core weka:

  • GUIChooser now has a plugin exension point that allows implementations of GUIChooser.GUIChooserMenuPlugin to appear as entries in either the Tools or Visualization menus
  • SubsetByExpression filter now has support for regexp matching
  • weka.classifiers.IterativeClassifierOptimizer - a classifier that can efficiently optimize the number of iterations for a base classifier that implements IterativeClassifier
  • Speedup for LogitBoost in the two class case
  • weka.filters.supervised.instance.ClassBalancer - a simple filter to balance the weight of classes
  • New class hierarchy for stopwords algorithms. Includes new methods to read custom stopwords from a file and apply multiple stopwords algorithms
  • Ability to turn off capabilities checking in Weka algorithms. Improves runtime for ensemble methods that create a lot of simple base classifiers
  • Memory savings in weka.core.Attribute
  • Improvements in runtime for SimpleKMeans and EM
  • weka.estimators.UnivariateMixtureEstimator - new mixture estimator

In packages:

  • New discriminantAnalysis package. Provides an implementation of Fisher's linear discriminant analysis
  • Quartile estimators, correlation matrix heat map and k-means++ clustering in distributed Weka
  • Support for default settings for GridSearch via a properties file
  • Improvements in scripting with addition of the offical Groovy console (kfGroovy package) from the Groovy project and TigerJython (new tigerjython package) as the Jython console via the GUIChooser
  • Support for the latest version of MLR in the RPlugin package
  • EAR4 package contributed by Vahid Jalali
  • StudentFilters package contributed by Chris Gearhart
  • graphgram package contributed by Johannes Schneider

Logo APCluster 1.4.1

by UBod - December 10, 2014, 12:58:29 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 21033 views, 3812 downloads, 3 subscriptions

Rating Whole StarWhole StarWhole StarWhole Star1/2 Star
(based on 2 votes)

About: The apcluster package implements Frey's and Dueck's Affinity Propagation clustering in R. The package further provides leveraged affinity propagation, exemplar-based agglomerative clustering, and various tools for visual analysis of clustering results.

Changes:
  • fixes in C++ code of sparse affinity propagation

Logo Accord.NET Framework 2.14.0

by cesarsouza - December 9, 2014, 23:04:04 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 19211 views, 3976 downloads, 2 subscriptions

About: The Accord.NET Framework is a .NET machine learning framework combined with audio and image processing libraries completely written in C#. It is a complete framework for building production-grade computer vision, computer audition, signal processing and statistics applications even for commercial use. A comprehensive set of sample applications provide a fast start to get up and running quickly, and an extensive online documentation helps fill in the details.

Changes:

Adding a large number of new distributions, such as Anderson-Daring, Shapiro-Wilk, Inverse Chi-Square, Lévy, Folded Normal, Shifted Log-Logistic, Kumaraswamy, Trapezoidal, U-quadratic and BetaPrime distributions, Birnbaum-Saunders, Generalized Normal, Gumbel, Power Lognormal, Power Normal, Triangular, Tukey Lambda, Logistic, Hyperbolic Secant, Degenerate and General Continuous distributions.

Other additions include new statistical hypothesis tests such as Anderson-Daring and Shapiro-Wilk; as well as support for all of LIBLINEAR's support vector machine algorithms; and format reading support for MATLAB/Octave matrices, LibSVM models, sparse LibSVM data files, and many others.

For a complete list of changes, please see the full release notes at the release details page at:

https://github.com/accord-net/framework/releases


Logo libAGF 0.9.8

by Petey - December 6, 2014, 02:35:39 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 10459 views, 2081 downloads, 2 subscriptions

About: C++ software for statistical classification, probability estimation and interpolation/non-linear regression using variable bandwidth kernel estimation.

Changes:

New in Version 0.9.8:

  • bug fixes: svm file conversion works properly and is more general

  • non-hierarchical multi-borders has 3 options for solving for the conditional probabilities: matrix inversion, voting, and matrix inversion over-ridden by voting, with re-normalization

  • multi-borders now works with external binary classifiers

  • random numbers resolve a tie when selecting classes based on probabilities

  • pair of routines, sort_discrete_vectors and search_discrete_vectors, for classification based on n-d binning (still experimental)

  • command options have been changed with many new additions, see QUICKSTART file or run the relevant commands for details


Logo The Statistical ToolKit 0.8.4

by joblion - December 5, 2014, 13:21:47 CET [ Project Homepage BibTeX Download ] 1159 views, 435 downloads, 2 subscriptions

About: STK++: A Statistical Toolkit Framework in C++

Changes:

Inegrating openmp to the current release. Many enhancement in the clustering project. bug fix


Logo libcluster 2.1

by dsteinberg - October 31, 2014, 23:27:57 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 899 views, 197 downloads, 2 subscriptions

About: An extensible C++ library of Hierarchical Bayesian clustering algorithms, such as Bayesian Gaussian mixture models, variational Dirichlet processes, Gaussian latent Dirichlet allocation and more.

Changes:

Initial Announcement on mloss.org.


Logo pSpectralClustering 1.1

by tbuehler - July 30, 2014, 19:44:52 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 6011 views, 1348 downloads, 2 subscriptions

About: A generalized version of spectral clustering using the graph p-Laplacian.

Changes:
  • fixed compatibility issue with Matlab R2013a+
  • several internal optimizations

Logo DRVQ 1.0.1-beta

by iavr - January 18, 2014, 17:26:34 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1473 views, 385 downloads, 1 subscription

About: DRVQ is a C++ library implementation of dimensionality-recursive vector quantization, a fast vector quantization method in high-dimensional Euclidean spaces under arbitrary data distributions. It is an approximation of k-means that is practically constant in data size and applies to arbitrarily high dimensions but can only scale to a few thousands of centroids. As a by-product of training, a tree structure performs either exact or approximate quantization on trained centroids, the latter being not very precise but extremely fast.

Changes:

Initial Announcement on mloss.org.


Logo ELKI 0.6.0

by erich - January 10, 2014, 18:32:28 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 12350 views, 2257 downloads, 3 subscriptions

About: ELKI is a framework for implementing data-mining algorithms with support for index structures, that includes a wide variety of clustering and outlier detection methods.

Changes:

Additions and Improvements from ELKI 0.5.5:

Algorithms

Clustering:

  • Hierarchical Clustering - the slower naive variants were added, and the code was refactored
  • Partition extraction from hierarchical clusterings - different linkage strategies (e.g. Ward)
  • Canopy pre-Clustering
  • Naive Mean-Shift Clustering
  • Affinity propagation clustering (both with distances and similarities / kernel functions)
  • K-means variations: Best-of-multiple-runs, bisecting k-means
  • New k-means initialization: farthest points, sample initialization
  • Cheng and Church Biclustering
  • P3C Subspace Clustering
  • One-dimensional clustering algorithm based on kernel density estimation

Outlier detection

  • COP - correlation outlier probabilities
  • LDF - a kernel density based LOF variant
  • Simplified LOF - a simpler version of LOF (not using reachability distance)
  • Simple Kernel Density LOF - a simple LOF using kernel density (more consistent than LDF)
  • Simple outlier ensemble algorithm
  • PINN - projection indexed nearest neighbors, via projected indexes.
  • ODIN - kNN graph based outlier detection
  • DWOF - Dynamic-Window Outlier Factor (contributed by Omar Yousry)
  • ABOD refactored, into ABOD, FastABOD and LBABOD

Distances

  • Geodetic distances now support different world models (WGS84 etc.) and are subtantially faster.
  • Levenshtein distances for processing strings, e.g. for analyzing phonemes (contributed code, see "Word segmentation through cross-lingual word-to-phoneme alignment", SLT2013, Stahlberg et al.)
  • Bray-Curtis, Clark, Kulczynski1 and Lorentzian distances with R-tree indexing support
  • Histogram matching distances
  • Probabilistic divergence distances (Jeffrey, Jensen-Shannon, Chi2, Kullback-Leibler)
  • Kulczynski2 similarity
  • Kernel similarity code has been refactored, and additional kernel functions have been added

Database Layer and Data Types

Projection layer * Parser for simple textual data (for use with Levenshtein distance) Various random projection families (including Feature Bagging, Achlioptas, and p-stable) Latitude+Longitude to ECEF Sparse vector improvements and bug fixes New filter: remove NaN values and missing values New filter: add histogram-based jitter New filter: normalize using statistical distributions New filter: robust standardization using Median and MAD New filter: Linear discriminant analysis (LDA)

Index Layer

  • Another speed up in R-trees
  • Refactoring of M- and R-trees: Support for different strategies in M-tree New strategies for M-tree splits Speedups in M-tree
  • New index structure: in-memory k-d-tree
  • New index structure: in-memory Locality Sensitive Hashing (LSH)
  • New index structure: approximate projected indexes, such as PINN
  • Index support for geodetic data - (Details: Geodetic Distance Queries on R-Trees for Indexing Geographic Data, SSTD13)
  • Sampled k nearest neighbors: reference KDD13 "Subsampling for Efficient and Effective Unsupervised Outlier Detection Ensembles"
  • Cached (precomputed) k-nearest neighbors to share across multiple runs
  • Benchmarking "algorithms" for indexes

Mathematics and Statistics

  • Many new distributions have been added, now 28 different distributions are supported
  • Additional estimation methods (using advanced statistics such as L-Moments), now 44 estimators are available
  • Trimming and Winsorizing
  • Automatic best-fit distribution estimation
  • Preprocessor using these distributions for rescaling data sets
  • API changes related to the new distributions support
  • More kernel density functions
  • RANSAC covariance matrix builder (unfortunately rather slow)

Visualization

  • 3D projected coordinates (Details: Interactive Data Mining with 3D-Parallel-Coordinate-Trees, SIGMOD2013)
  • Convex hulls now also include nested hierarchical clusters

Other

  • Parser speedups
  • Sparse vector bug fixes and improvements
  • Various bug fixes
  • PCA, MDS and LDA filters
  • Text output was slightly improved (but still needs to be redesigned from scratch - please contribute!)
  • Refactoring of hierarchy classes
  • New heap classes and infrastructure enhancements
  • Classes can have aliases, e.g. "l2" for euclidean distance.
  • Some error messages were made more informative.
  • Benchmarking classes, also for approximate nearest neighbor search.

Logo Malheur 0.5.4

by konrad - December 25, 2013, 13:20:31 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 14171 views, 2732 downloads, 1 subscription

About: Automatic Analysis of Malware Behavior using Machine Learning

Changes:

Support for new version of libarchive. Minor bug fixes.


Logo Gesture Recogition Toolkit 0.1 Revision 289

by ngillian - December 13, 2013, 22:59:53 CET [ Project Homepage BibTeX Download ] 4549 views, 870 downloads, 1 subscription

About: The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, c++ machine learning library that has been specifically designed for real-time gesture recognition. It features a large number of machine-learning algorithms for both classification and regression in addition to a wide range of supporting algorithms for pre-processing, feature extraction and dataset management. The GRT has been designed for real-time gesture recognition, but it can also be applied to more general machine-learning tasks.

Changes:

Added Decision Tree and Random Forests.


Logo FABIA 2.8.0

by hochreit - October 18, 2013, 10:14:57 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 10484 views, 2198 downloads, 1 subscription

Rating Whole StarWhole StarWhole StarWhole Star1/2 Star
(based on 1 vote)

About: FABIA is a biclustering algorithm that clusters rows and columns of a matrix simultaneously. Consequently, members of a row cluster are similar to each other on a subset of columns and, analogously, members of a column cluster are similar to each other on a subset of rows. Biclusters are found by factor analysis where both the factors and the loading matrix are sparse. FABIA is a multiplicative model that extracts linear dependencies between samples and feature patterns. Applications include detection of transcriptional modules in gene expression data and identification of haplotypes/>identity by descent< consisting of rare variants obtained by next generation sequencing.

Changes:

CHANGES IN VERSION 2.8.0

NEW FEATURES

o rescaling of lapla
o extractPlot does not plot sorted matrices

CHANGES IN VERSION 2.4.0

o spfabia bugfixes

CHANGES IN VERSION 2.3.1

NEW FEATURES

o Getters and setters for class Factorization

2.0.0:

  • spfabia: fabia for a sparse data matrix (in sparse matrix format) and sparse vector/matrix computations in the code to speed up computations. spfabia applications: (a) detecting >identity by descent< in next generation sequencing data with rare variants, (b) detecting >shared haplotypes< in disease studies based on next generation sequencing data with rare variants;
  • fabia for non-negative factorization (parameter: non_negative);
  • changed to C and removed dependencies to Rcpp;
  • improved update for lambda (alpha should be smaller, e.g. 0.03);
  • introduced maximal number of row elements (lL);
  • introduced cycle bL when upper bounds nL or lL are effective;
  • reduced computational complexity;
  • bug fixes: (a) update formula for lambda: tighter approximation, (b) corrected inverse of the conditional covariance matrix of z;

1.4.0:

  • New option nL: maximal number of biclusters per row element;
  • Sort biclusters according to information content;
  • Improved and extended preprocessing;
  • Update to R2.13

Logo Apache Mahout 0.8

by gsingers - July 27, 2013, 15:52:32 CET [ Project Homepage BibTeX Download ] 16721 views, 4544 downloads, 2 subscriptions

About: Apache Mahout is an Apache Software Foundation project with the goal of creating both a community of users and a scalable, Java-based framework consisting of many machine learning algorithm [...]

Changes:

Apache Mahout 0.8 contains, amongst a variety of performance improvements and bug fixes, an implementation of Streaming K-Means, deeper Lucene/Solr integration and new scalable recommender algorithms. For a full description of the newest release, see http://mahout.apache.org/.


About: This toolbox implements a novel visualization technique called Sectors on Sectors (SonS), and a extended version called Multidimensional Sectors on Sectors (MDSonS), for improving the interpretation of several data mining algorithms. The MDSonS method makes use of Multidimensional Scaling (MDS) to solve the main drawback of the previous method, namely, the lack of representing distances between pairs of clusters. These methods have been applied for visualizing the results of hierarchical clustering, Growing Hierarchical Self-Organizing Maps (GHSOM), classification trees and several manifolds. These methods make possible to extract all the existing relationships among centroids’ attributes at any hierarchy level.

Changes:

Initial Announcement on mloss.org.


Logo ClusterEval 1.0

by cdevries - June 16, 2013, 04:15:30 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2008 views, 591 downloads, 1 subscription

About: Cluster quality Evaluation software. Implements cluster quality metrics based on ground truths such as Purity, Entropy, Negentropy, F1 and NMI. It includes a novel approach to correct for pathological or ineffective clusterings called 'Divergence from a Random Baseline'.

Changes:

Initial Announcement on mloss.org.


Logo MLDemos 0.5.1

by basilio - March 2, 2013, 16:06:13 CET [ Project Homepage BibTeX Download ] 20637 views, 4837 downloads, 2 subscriptions

About: MLDemos is a user-friendly visualization interface for various machine learning algorithms for classification, regression, clustering, projection, dynamical systems, reward maximisation and reinforcement learning.

Changes:

New Visualization and Dataset Features Added 3D visualization of samples and classification, regression and maximization results Added Visualization panel with individual plots, correlations, density, etc. Added Editing tools to drag/magnet data, change class, increase or decrease dimensions of the dataset Added categorical dimensions (indexed dimensions with non-numerical values) Added Dataset Editing panel to swap, delete and rename dimensions, classes or categorical values Several bug-fixes for display, import/export of data, classification performance

New Algorithms and methodologies Added Projections to pre-process data (which can then be classified/regressed/clustered), with LDA, PCA, KernelPCA, ICA, CCA Added Grid-Search panel for batch-testing ranges of values for up to two parameters at a time Added One-vs-All multi-class classification for non-multi-class algorithms Trained models can now be kept and tested on new data (training on one dataset, testing on another) Added a dataset generator panel for standard toy datasets (e.g. swissroll, checkerboard,...) Added a number of clustering, regression and classification algorithms (FLAME, DBSCAN, LOWESS, CCA, KMEANS++, GP Classification, Random Forests) Added Save/Load Model option for GMMs and SVMs Added Growing Hierarchical Self Organizing Maps (original code by Michael Dittenbach) Added Automatic Relevance Determination for SVM with RBF kernel (Thanks to Ashwini Shukla!)


Showing Items 1-20 of 41 on page 1 of 3: 1 2 3 Next