Projects supporting the plain ascii data format.


Logo MLPACK 1.0.5

by rcurtin - May 2, 2013, 07:24:32 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 20250 views, 3566 downloads, 4 subscriptions

About: A scalable, fast C++ machine learning library, with emphasis on usability.

Changes:

Speedups of cover tree traversers; addition of rank-approximate nearest neighbor (RANN); addition of fast exact max-kernel search (FastMKS); fix for EM covariance estimation; more parameters for GMM estimation; force GMM and GaussianDistribution covariance matrices to be positive definite during training; add a tolerance parameter to the Baum-Welch algorithm for HMM training; fix for compilation with clang; fix for k-furthest neighbor search.


Logo Tapkee 1.0rc1

by blackburn - March 18, 2013, 13:04:41 CET [ Project Homepage BibTeX Download ] 1797 views, 329 downloads, 0 subscriptions

About: Tapkee is an efficient and flexible C++ template library for dimensionality reduction.

Changes:

Initial Announcement on mloss.org.


Logo JMLR SHOGUN 2.1.0

by sonne - March 17, 2013, 13:59:34 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 41043 views, 8596 downloads, 4 subscriptions

Rating Whole StarWhole StarWhole StarWhole StarEmpty Star
(based on 5 votes)

About: The SHOGUN machine learning toolbox's focus is on large scale learning methods with focus on Support Vector Machines (SVM), providing interfaces to python, octave, matlab, r and the command line.

Changes:

This release also contains several enhancements, cleanups and bugfixes:

Features

  • Linear Time MMD two-sample test now works on streaming-features, which allows to perform tests on infinite amounts of data. A block size may be specified for fast processing. The below features were also added. By Heiko Strathmann.
  • It is now possible to ask streaming features to produce an instance of streamed features that are stored in memory and returned as a CFeatures* object of corresponding type. See CStreamingFeatures::get_streamed_features().
  • New concept of artificial data generator classes: Based on streaming features. First implemented instances are CMeanShiftDataGenerator and CGaussianBlobsDataGenerator. Use above new concepts to get non-streaming data if desired.
  • Accelerated projected gradient multiclass logistic regression classifier by Sergey Lisitsyn.
  • New CCSOSVM based structured output solver by Viktor Gal
  • A collection of kernel selection methods for MMD-based kernel two- sample tests, including optimal kernel choice for single and combined kernels for the linear time MMD. This finishes the kernel MMD framework and also comes with new, more illustrative examples and tests. By Heiko Strathmann.
  • Alpha version of Perl modular interface developed by Christian Montanari.
  • New framework for unit-tests based on googletest and googlemock by Viktor Gal. A (growing) number of unit-tests from now on ensures basic funcionality of our framework. Since the examples do not have to take this role anymore, they should become more ilustrative in the future.
  • Changed the core of dimension reduction algorithms to the Tapkee library.

Bugfixes

  • Fix for shallow copy of gaussian kernel by Matt Aasted.
  • Fixed a bug when using StringFeatures along with kernel machines in cross-validation which cause an assertion error. Thanks to Eric (yoo)!
  • Fix for 3-class case training of MulticlassLibSVM reported by Arya Iranmehr that was suggested by Oksana Bayda.
  • Fix for wrong Spectrum mismatch RBF construction in static interfaces reported by Nona Kermani.
  • Fix for wrong include in SGMatrix causing build fail on Mac OS X (thanks to @bianjiang).
  • Fixed a bug that caused kernel machines to return non-sense when using custom kernel matrices with subsets attached to them.
  • Fix for parameter dictionary creationg causing dereferencing null pointers with gaussian processes parameter selection.
  • Fixed a bug in exact GP regression that caused wrong results.
  • Fixed a bug in exact GP regression that produced memory errors/crashes.
  • Fix for a bug with static interfaces causing all outputs to be -1/+1 instead of real scores (reported by Kamikawa Masahisa).

Cleanup and API Changes

  • SGStringList is now based on SGReferencedData.
  • "confidences" in context of CLabel and subclasses are now "values".
  • CLinearTimeMMD constructor changes, only streaming features allowed.
  • CDataGenerator will soon be removed and replaced by new streaming- based classes.
  • SGVector, SGMatrix, SGSparseVector, SGSparseVector, SGSparseMatrix refactoring: Now contains load/save routines, relevant functions from CMath, and implementations went to .cpp file.

Logo MLDemos 0.5.1

by basilio - March 2, 2013, 16:06:13 CET [ Project Homepage BibTeX Download ] 12953 views, 2924 downloads, 2 subscriptions

About: MLDemos is a user-friendly visualization interface for various machine learning algorithms for classification, regression, clustering, projection, dynamical systems, reward maximisation and reinforcement learning.

Changes:

New Visualization and Dataset Features Added 3D visualization of samples and classification, regression and maximization results Added Visualization panel with individual plots, correlations, density, etc. Added Editing tools to drag/magnet data, change class, increase or decrease dimensions of the dataset Added categorical dimensions (indexed dimensions with non-numerical values) Added Dataset Editing panel to swap, delete and rename dimensions, classes or categorical values Several bug-fixes for display, import/export of data, classification performance

New Algorithms and methodologies Added Projections to pre-process data (which can then be classified/regressed/clustered), with LDA, PCA, KernelPCA, ICA, CCA Added Grid-Search panel for batch-testing ranges of values for up to two parameters at a time Added One-vs-All multi-class classification for non-multi-class algorithms Trained models can now be kept and tested on new data (training on one dataset, testing on another) Added a dataset generator panel for standard toy datasets (e.g. swissroll, checkerboard,...) Added a number of clustering, regression and classification algorithms (FLAME, DBSCAN, LOWESS, CCA, KMEANS++, GP Classification, Random Forests) Added Save/Load Model option for GMMs and SVMs Added Growing Hierarchical Self Organizing Maps (original code by Michael Dittenbach) Added Automatic Relevance Determination for SVM with RBF kernel (Thanks to Ashwini Shukla!)


Logo JMLR Jstacs 2.0

by keili - July 30, 2012, 13:31:02 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 9520 views, 2094 downloads, 2 subscriptions

About: A Java framework for statistical analysis and classification of biological sequences

Changes:

February 2, 2012: Jstacs 2.0 released

Jstacs 2.0 changes many names and the structure of several packages. It is not code-compatible with Jstacs 1.5 and earlier

RESTRUCTURING and RENAMING:

former ScoringFunction, NormalizableScoringFunction, Model

  • new base-interface SequenceScore
  • new sub-interface StatisticalModel of SequenceScore for all statistical models with sub-iterfaces DifferentiableStatisticalModel and TrainableStatisticalModel
  • new interface DifferentiableSequenceScore replaces ScoringFunction
  • new interface DifferentiableStatisticalModel replaces NormalizableScoringFunction
  • new interface TrainableStatisticalModel replaces Model
  • new abstract class AbstractDifferentiableSequenceScore
  • new abstract class AbstractDifferentiableStatisticalModel replaces AbstractNormalizableScoringFunction
  • new abstract class AbstractTrainableStatisticalModel replaces AbstractModel
  • former Models renamed to TrainSM
  • former ScoringFunction renamed to DiffSS or DiffSM
  • getProbFor removed from TrainableStatisticalModel (former Model) and conceptually replaced by getLogProbFor
  • getLogScore(Sequence,int,int) with changed meaning of arguments: getLogScore(Sequence,start,end) instead of getLogScore(Sequence,start,length)
  • isTrained() replaced by common method isInitialized()

Parameters and Results

  • new super-class of Parameters and Results: AnnotatedEntity
  • common list-type for Parameters and Results: AnnotatedEntityList
  • Renaming: CollectionParameter -> SelectionParameter, MultiSelectionCollectionParameter -> MultiSelectionParameter, new super-class AbstractSelectionParameter
  • major refactoring due to common hierarchy and code-cleanup
  • lazy evaluation of Parameter/ParameterSet hierarchies moved from ParameterSet (loadParameters()) to ParameterSetContainer (constructor on class)
  • SubclassFinder adapted to lazy evaluation

performance measures

  • new abstract super-class AbstractPerformanceMeasure of all performance measures
  • new interface NumericalPerformanceMeasure for all performance measures that return a single number (as opposed, e.g., to curves)
  • new class PerformanceMeasureParameterSet for a collection of general performance measures
  • new class NumericalPerformanceMeasureParameterSet for a collection of NumericalPerformanceMeasures
  • used in evaluate-method of AbstractClassifier and in ClassifierAssessments

further changes

  • Sample renamed to DataSet
  • evaluate and evaluateAll in AbstractClassifier joined
  • new class IndependentProductDiffSS as super-class of IndepedentProductDiffSM (former IndependentProductScoringFunction)
  • new class UniformDiffSS as super-class of UniformDiffSM (former UniformScoringFunction)

NEW FUNCTIONALITY:

  • multi-threaded implementation of Baum-Welch and Viterbi training of hidden Markov models
  • new Interface Singleton that can be used for singleton instances to save memory, current examples: DNAAlphabet, DNAAlphabetContainer, ProteinAlphabet
  • added ProteinAlphabet
  • added possibility to use NaN-values with ContinuousAlphabets
  • added ArbitraryFloatSequence including static methods for DataSet creation for cases where double-precision is not needed
  • new performance measure MaximumFMeasure
  • access to Parameters in ParameterSets and Results in ResultSets by name
  • emitDataSet in BayesianNetworkDiffSM
  • new static method Time.getTimeInstance that returns UserTime or RealTime depending on availability of shared lib
  • SubclassFinder allows for adding own base packages
  • new method overlaps() in LocatedSequenceAnnotationWithLength
  • AbstractTerminationCondition used in ScoreClassifier and sub-classes
  • public method propagateESS in HMMFactory
  • new method generateLog in DirichletMRG for drawing log-values
  • added DifferentiableStatisticalModelFactory

BUGFIXES/IMPROVEMENTS:

  • bugfix in propagation of equivalent sample size in HMMFactory
  • bugfix in random initialization of BasicHigherOrderTransition
  • improved Alignment implementation
  • SafeOutputStream with new static factory method getSafeOutputStream, write methods now work on Objects

DOCUMENTATION:

  • improved Javadocs in many classes and packages
  • new Cookbook with extensive documentation and explanation

MISC:

  • output of NonParsableException more verbose
  • Exceptions in multi-threaded code now lead to exit of program instead of only stopping the thread
  • update of RServe/RClient

Logo Aciqra 1.2.1

by Caglow - June 25, 2009, 23:30:22 CET [ BibTeX Download ] 1958 views, 969 downloads, 1 subscription

About: A desktop planetarium and sky map program which shows the sky from anywhere on Earth at any time.

Changes:

Removed erroneous topocentric code. Increased maximum zoom for detail on planets.


Logo CPLVE 0.1

by wannesm - June 5, 2009, 13:06:42 CET [ BibTeX Download ] 1965 views, 603 downloads, 1 subscription

About: Preparing

Changes:

Initial Announcement on mloss.org.


Logo Ohmm 0.02

by hillbig - May 21, 2009, 10:07:53 CET [ Project Homepage BibTeX Download ] 2682 views, 716 downloads, 1 subscription

About: Ohmm is a library for learning hidden Markov models by using Online EM algorithm. This library is specialized for large scale data; e.g. 1 million words. The output includes parameters, and estimation results.

Changes:

Initial Announcement on mloss.org.


Logo CRFsuite 0.8

by chokkan - March 18, 2009, 15:19:02 CET [ Project Homepage BibTeX Download ] 4173 views, 944 downloads, 1 subscription

Rating Whole StarWhole StarWhole StarWhole Star1/2 Star
(based on 1 vote)

About: CRFSuite is a speed-oriented implementation of Conditional Random Fields (CRFs). This software features: parameter estimation using SGD and L-BFGS, l1/l2 regularization, simple data I/O format, etc.

Changes:

Initial Announcement on mloss.org.