-
- Description:
The Cognitive Foundry is a modular Java software library for the research and development of cognitive systems. It is primarily designed for research and development to be easy to plug into applications to provide adaptive behaviors.
The main part of the Foundry is the Machine Learning package, which contains reusable components and algorithms for machine learning and statistics. It contains many algorithms for supervised and unsupervised learning as well as statistical modeling. It is interface-centric and uses generics to make it easy to customize to the needs of individual applications.
The Cognitive Foundry's development is led by Sandia National Laboratories and is released under the open source BSD License. It requires Java 1.7.
- Changes to previous version:
-
General:
- Upgraded MTJ to 1.0.3.
-
Common:
- Added package for hash function computation including Eva, FNV-1a, MD5, Murmur2, Prime, SHA1, SHA2
- Added callback-based forEach implementations to Vector and InfiniteVector, which can be faster for iterating through some vector types.
- Optimized DenseVector by removing a layer of indirection.
- Added method to compute set of percentiles in UnivariateStatisticsUtil and fixed issue with percentile interpolation.
- Added utility class for enumerating combinations.
- Adjusted ScalarMap implementation hierarchy.
- Added method for copying a map to VectorFactory and moved createVectorCapacity up from SparseVectorFactory.
- Added method for creating square identity matrix to MatrixFactory.
- Added Random implementation that uses a cached set of values.
-
Learning:
- Implemented feature hashing.
- Added factory for random forests.
- Implemented uniform distribution over integer values.
- Added Chi-squared similarity.
- Added KL divergence.
- Added general conditional probability distribution.
- Added interfaces for Regression, UnivariateRegression, and MultivariateRegression.
- Fixed null pointer exception that can happen in K-means with an empty cluster.
- Fixed name of maxClusters property on AgglomerativeClusterer (was called maxMinDistance).
-
Text:
- Improvements to LDA Gibbs sampler.
-
General:
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Agnostic, Platform Independent
- Data Formats: Matlab, Csv, Xml, Xstream
- Tags: Classification, Clustering, Adaboost, Decision Tree Learning, Algorithms, Gaussian Mixture Models, Bagging, Ensemble Methods, Gaussian Processes, Affinity Propagation, Bfgs, Generics, Genetic Algorith
- Archive: download here
Other available revisons
-
Version Changelog Date 3.4.2 -
General:
- Upgraded MTJ to 1.0.3.
-
Common:
- Added package for hash function computation including Eva, FNV-1a, MD5, Murmur2, Prime, SHA1, SHA2
- Added callback-based forEach implementations to Vector and InfiniteVector, which can be faster for iterating through some vector types.
- Optimized DenseVector by removing a layer of indirection.
- Added method to compute set of percentiles in UnivariateStatisticsUtil and fixed issue with percentile interpolation.
- Added utility class for enumerating combinations.
- Adjusted ScalarMap implementation hierarchy.
- Added method for copying a map to VectorFactory and moved createVectorCapacity up from SparseVectorFactory.
- Added method for creating square identity matrix to MatrixFactory.
- Added Random implementation that uses a cached set of values.
-
Learning:
- Implemented feature hashing.
- Added factory for random forests.
- Implemented uniform distribution over integer values.
- Added Chi-squared similarity.
- Added KL divergence.
- Added general conditional probability distribution.
- Added interfaces for Regression, UnivariateRegression, and MultivariateRegression.
- Fixed null pointer exception that can happen in K-means with an empty cluster.
- Fixed name of maxClusters property on AgglomerativeClusterer (was called maxMinDistance).
-
Text:
- Improvements to LDA Gibbs sampler.
October 30, 2015, 06:53:03 3.4.1 -
General:
- Updated MTJ to version 1.0.2 and netlib-java to 1.1.2.
- Updated XStream to version 1.4.8.
-
Common:
- Fixed issue in VectorUnionIterator.
-
Learning:
- Added Alternating Least Squares (ALS) Factorization Machine training implementation.
- Fixed performance issue in Factorization Machine where linear component was not making use of sparsity.
- Added utility function to sigmoid unit.
May 13, 2015, 06:55:24 3.4.0 -
General:
- Now requires Java 1.7 or higher.
- Improved compatibility with Java 1.8 functions by removing ClonableSerializable requirement from many function-style interfaces.
-
Common Core:
- Improved iteration speed over sparse MTJ vectors.
- Added utility methods for more stable log(1+x), exp(1-x), log(1 - exp(x)), and log(1 + exp(x)) to LogMath.
- Added method for creating a partial permutations to Permutation.
- Added methods for computing standard deviation to UnivariateStatisticsUtil.
- Added increment, decrement, and list view methods to Vector and Matrix.
- Added shorter versions of get and set for Vector and Matrix getElement and setElement.
- Added aliases of dot for dotProduct in VectorSpace.
- Added utility methods for divideByNorm2 to VectorUtil.
-
Learning:
- Added a learner for a Factorization Machine using SGD.
- Added a iterative reporter for validation set performance.
- Added new methods to statistical distribution classes to allow for faster sampling without boxing, in batches, or without creating extra memory.
- Made generics for performance evaluators more permissive.
- ParameterGradientEvaluator changed to not require input, output, and gradient types to be the same. This allows more sane gradient definitions for scalar functions.
- Added parameter to enforce a minimum size in a leaf node for decision tree learning. It is configured through the splitting function.
- Added ability to filter which dimensions to use in the random subspace and variance tree node splitter.
- Added ReLU, leaky ReLU, and soft plus activation functions for neural networks.
- Added IntegerDistribution interface for distributions over natural numbers.
- Added a method to get the mean of a numeric distribution without boxing.
- Fixed an issue in DefaultDataDistribution that caused the total to be off when a value was set to less than or equal to 0.
- Added property for rate to GammaDistribution.
- Added method to get standard deviation from a UnivariateGaussian.
- Added clone operations for decision tree classes.
- Fixed issue TukeyKramerConfidence interval computation.
- Fixed serialization issue with SMO output.
April 3, 2015, 08:28:14 3.3.3 -
General:
- Made code able to compile under both Java 1.6 and 1.7. This required removing some potentially unsafe methods that used varargs with generics.
- Upgraded XStream dependency to 1.4.4.
- Improved support for regression algorithms in learning.
- Added general-purpose adapters to make it easier to compose learning algorithms and adapt their input or output.
-
Common Core:
- Added isSparse, toArray, dotDivide, and dotDivideEquals methods for Vector and Matrix.
- Added scaledPlus, scaledPlusEquals, scaledMinus, and scaledMinusEquals to Ring (and thus Vector and Matrix) for potentially faster such operations.
- Fixed issue where matrix and dense vector equals was not checking for equal dimensionality.
- Added transform, transformEquals, tranformNonZeros, and transformNonZerosEquals to Vector.
- Made LogNumber into a signed version of a log number and moved the prior unsigned implementation into UnsignedLogNumber.
- Added EuclideanRing interface that provides methods for times, timesEquals, divide, and divideEquals. Also added Field interface that provides methods for inverse and inverseEquals. These interfaces are now implemented by the appropriate number classes such as ComplexNumber, MutableInteger, MutableLong, MutableDouble, LogNumber, and UnsignedLogNumber.
- Added interface for Indexer and DefaultIndexer implementation for creating a zero-based indexing of values.
- Added interfaces for MatrixFactoryContainer and DivergenceFunctionContainer.
- Added ReversibleEvaluator, which various identity functions implement as well as a new utility class ForwardReverseEvaluatorPair to create a reversible evaluator from a pair of other evaluators.
- Added method to create an ArrayList from a pair of values in CollectionUtil.
- ArgumentChecker now properly throws assertion errors for NaN values. Also added checks for long types.
- Fixed handling of Infinity in subtraction for LogMath.
- Fixed issue with angle method that would cause a NaN if cosine had a rounding error.
- Added new createMatrix methods to MatrixFactory that initializes the Matrix with the given value.
- Added copy, reverse, and isEmpty methods for several array types to ArrayUtil.
- Added utility methods for creating a HashMap, LinkedHashMap, HashSet, or LinkedHashSet with an expected size to CollectionUtil.
- Added getFirst and getLast methods for List types to CollectionUtil.
- Removed some calls to System.out and Exception.printStackTrace.
-
Common Data:
- Added create method for IdentityDataConverter.
- ReversibleDataConverter now is an extension of ReversibleEvaluator.
-
Learning Core:
- Added general learner transformation capability to make it easier to adapt and compose algorithms. InputOutputTransformedBatchLearner provides this capability for supervised learning algorithms by composing together a triplet. CompositeBatchLearnerPair does it for a pair of algorithms.
- Added a constant and identity learners.
- Added Chebyshev, Identity, and Minkowski distance metrics.
- Added methods to DatasetUtil to get the output values for a dataset and to compute the sum of weights.
- Made generics more permissive for supervised cost functions.
- Added ClusterDistanceEvaluator for taking a clustering that encodes the distance from an input value to all clusters and returns the result as a vector.
- Fixed potential round-off issue in decision tree splitter.
- Added random subspace technique, implemented in RandomSubspace.
- Separated functionality from LinearFunction into IdentityScalarFunction. LinearFunction by default is the same, but has parameters that can change the slope and offset of the function.
- Default squashing function for GeneralizedLinearModel and DifferentiableGeneralizedLinearModel is now a linear function instead of an atan function.
- Added a weighted estimator for the Poisson distribution.
- Added Regressor interface for evaluators that are the output of (single-output) regression learning algorithms. Existing such evaluators have been updated to implement this interface.
- Added support for regression ensembles including additive and averaging ensembles with and without weights. Added a learner for regression bagging in BaggingRegressionLearner.
- Added a simple univariate regression class in UnivariateLinearRegression.
- MultivariateDecorrelator now is a VectorInputEvaluator and VectorOutputEvaluator.
- Added bias term to PrimalEstimatedSubGradient.
-
Text Core:
- Fixed issue with the start position for tokens from LetterNumberTokenizer being off by one except for the first one.
May 21, 2013, 05:59:37 3.3.2 -
Common Core:
- Added checkedAdd and checkedMultiply functions to MathUtil, providing a means for conducting Integer addition and multiplication with explicit checking for overflow and underflow, and throwing an ArithmeticException if they occur. Java fails silently in integer over(under)flow situations.
- Added explicit integer overflow checks to DenseMatrix. The underlying MTJ library stores dense matrices as a single dimensional arrays of integers, which in Java are 32-bit. When creating a matrix with numRows rows and numColumns columns, if numRows * numColumns is more than 2^31 - 1, a silent integer overflow would occur, resulting in later ArrayIndexOutOfBoundsExceptions when attempting to access matrix elements that didn't get allocated.
- Added new methods to DiagonalMatrix interface for multiplying diagonal matrices together and for inverting a DiagonalMatrix.
- Optimized operations on diagonal matrices in DiagonalMatrixMTJ.
- Added checks to norm method in AbstractVectorSpace and DefaultInfiniteVector for power set to NaN, throwing an ArithmeticException if encountered.
-
Learning Core:
- Optimized matrix multiplies in LogisticRegression to avoid creating dense matrices unnecessarily and to reduce computation time using improved DiagonalMatrix interfaces.
- Added regularization and explicit bias estimation to MultivariateLinearRegression.
- Added ConvexReceiverOperatingCharacteristic, which computes the convex hull of the ROC.
- Fixed rare corner-case bug in ReceiverOperatingCharacteristic and added optional trapezoidal AUC computation.
- Cleaned up constant in MultivariateCumulativeDistributionFunction and added publication references.
November 8, 2011, 05:14:19 3.3.1 -
Common Core:
- Added NumericMap interface, which provides a mapping of keys to numeric values.
- Added ScalarMap interface, which extends NumericMap to provide a mapping of objects to scalar values represented as doubled.
- Added AbstractScalarMap and AbstractMutableDoubleMap to provide abstract, partial implementations of the ScalarMap interface.
- Added VectorSpace interface, where a VectorSpace is a type of Ring that you can perform Vector-like operations on such as norm, distances, etc.
- Added AbstractVectorSpace, which provides an abstract, partial implementation of the VectorSpace interface.
- Updated Vector, AbstractVector, VectorEntry to build on new VectorSpace interface and AbstractVectorSpace class.
- Added InfiniteVector interface, which has a potentially infinite number of indices, but contains only a countable number in any given instance.
- Added DefaultInfiniteVector, an implementation of the InfiniteVector interface backed by a LinkedHashMap.
- Rewrote FiniteCapacityBuffer from the ground up, now with backing from a fixed-size array to minimize memory allocation.
- Renamed IntegerCollection to IntegerSpan.
-
Learning Core:
- Updated ReceiverOperatingCharacteristic to improve calculation
- Added PriorWeightedNodeLearner interface, which provides for configuring the prior weights on the learning algorithm that searches for a decision function inside a decision tree.
- Updated AbstractDecisionTreeNode to fix off by one error in computing node's depth.
- Updated CategorizationTreeLearner to add ability to specify class priors for decision tree algorithm.
- Updated VectorThresholdInformationGainLearner to add class priors to information gain calculation.
- Updated SequentialMinimalOptimization to improve speed.
- Added LinearBasisRegression, which uses a basis function to generate vectors before performing a LinearRegression.
- Added MultivariateLinearRegression, which performs multivariate regression; does not explicitly estimate a bias term or perform regularization.
- Added LinearDiscriminantWithBias, which provides a LinearDiscriminant with an additional bias term that gets added to the output of the dot product.
- Updated LinearRegression and LogisticRegression to provide for bias term estimation and use of L2 regularization.
- Renamed SquashedMatrixMultiplyVectorFunction to GeneralizedLinearModel.
- Renamed DifferentiableSquashedMatrixMultiplyVectorFunction to DifferentiableGeneralizedLinearModel.
- Renamed MatrixMultiplyVectorFunction to MultivariateDiscriminant.
- Added MultivariateDiscriminantWithBias, which provides a multivariate discriminant with a bias term.
- Renamed DataHistogram to DataDistribution.
- Renamed AbstractDataHistogram to AbstractDataDistribution.
- Added DefaultDataDistribution, a default implementation of the DataDistribution interface that uses a backing map.
- Added LogisticDistribution, an implementation of the scalar logistic distribution.
- Updated MultivariateGaussian to provide for incremental estimation of covariance-matrix inverse without a single matrix inversion.
- Removed DecoupledVectorFunction.
- Removed DecoupledVectorLinearRegression.
- Removed PointMassDistribution.
- Removed MapBasedDataHistogram.
- Removed MapBasedPointDistribution.
- Removed MapBasedSortedDataHistogram.
- Removed AbstractBayseianRegression.
- Additional general reworking and clean up of distribution code, impacting classes in gov.sandia.cognition.statistics.distribution package.
-
Text Core:
- Renamed LatentDirichetAllocationVectorGibbsSampler to LatentDirichletAllocationVectorGibbsSampler to fix misspelling.
- Added ParallelLatentDirichletAllocationVectorGibbsSampler, a parallelized version of Latent Dirichlet Allocation.
October 7, 2011, 07:18:27 3.3.0 -
Common Core:
- Added LogMath and LogNumber as utilities for computation involving numbers represented in log-space.
- Added MutableInteger and MutableLong, which are like Integer and Long but with a mutable value, similar to MutableDouble.
- UnivariateSummaryStatistics added.
- Added DefaultIndentifiedValue.
-
Learning Core:
- VectorNaiveBayesCategorizer: evaluateWithDiscriminant now properly normalizes the discriminant, which improves results for things like AUC.
- PartitionalClusterer: Fixed corner-case bug.
- Added confidence-weighted online learners based on variance, standard deviation, and AROW. They produce ConfidenceWeightedBinaryCategorizer objects with either diagonal or full covariance matrices.
- Added balanced versions of bagging and IVoting in CategoryBalancedBaggingLearner and CategoryBalancedIVotingLearner.
- Added online learners based on ROMMA, AROMMA, Ballseptron, Ramp-loss Passive Aggressive Perceptron, Shifting Perceptron, Forgetron, Projectron, Randomized Budget Perceptron, and Stoptron.
- Added KernelizableBinaryCategorizerOnlineLearner interface for an online linear binary categorizer that can also be used with a kernel. Also provided abstract class AbstractKernelizableBinaryCategorizerOnlineLearner and AbstractLinearCombinationOnlineLearner for common functionality.
- Moved KernelPerceptron and KernelAdatron to the new learning.perceptron.kernel package.
- Added KernelUtil utility class for dealing with kernels and kernel binary categorizers.
- Refactored statistics to remove getMean() from Distribution due to some distributions not having a meaningful mean and confusion for what the mean meant in some classes.
- Classes and interfaces interfaces named Scalar renamed to Univariate for clarity.
- Added getTestStatistic() method to ConfidenceStatistic interface and implemented in existing classes.
- Added support for multiple comparisons tests, including Bonferroni, Holm, Nemenyi, Shaffer, Sidak, and Tukey-Kramer and updated AnalysisOfVarianceOneWay (ANOVA) and FriedmanConfidence.
- Added multiple-comparisons experiment classes BlockExperimentComparison and MultipleComparisonExperiment
-
Text Core:
- Renamed ProbabilisticLatentSemanticAnalysis.Transform to ProbabilisticLatentSemanticAnalysis.Result for consistency.
June 22, 2011, 19:52:06 3.2.0 - Upgraded to mtj-9.9.14 and added the netlib-java-0.9.3 library, which MTJ now depends on.
- Common Core:
- ParallelUtil: Added executeInParallel(tasks, algorithm)
- CollectionUtil: Added removeElement.
- Added MutableDouble class, which is like Double but with a mutable value. It also implements Ring and Vectorizable.
- UnivariateStatisticsUtil: Improved stability of computeMean with large values.
- VectorReader: Changed to use a Collection of tokens instead of requiring an ArrayList.
- AbstractSingularValueDecomposition: Fixed a bug for certain types of rectangular matrices.
- DiagonalMatrixMTJ: Made inverse faster.
- ArgumentChecker: Added assertIsInRangeExclusive and fixed a formatting issue with some exception messages.
- Added Identified interface.
- Learning Core:
- Added BatchAndIncrementalLearner interface, which is for algorithms that can be used in batch and incremental modes. Note that it defines that the BatchLearner interface uses a Collection of data for consistency with existing batch (non-incremental) algorithms. However, it does also include another learn method that takes an Iterable. AbstractBatchAndIncrementalLearner now implements it.
- Added SupervisedIncrementalLearner interface, supervised version, SupervisedBatchAndIncrementalLearner, and abstract class AbstractSupervisedBatchAndIncrementalLearner. Many incremental learning algorithms now implement these interfaces and extend this abstract class.
- VectorNaiveBayesCategorizer: Now has generic for the distribution type representing each feature, including OnlingBaggingCategorizerLearner, AbstractOnlineLinearBinaryCategorizerLearner, OnlinePassiveAggressivePerceptron, OnlinePerceptron, OnlineVotedPerceptron, and Winnow.
- BaggingCategorizerLearner: Added a protected fillBag method, which can be overridden to implement a different sampling approach.
- Added BatchMultiPerceptron, which is an implementation of a multi-class Perceptron that keeps one Perceptron per class.
- Added MultiCategoryAdaBoost, which is an implementation of the AdaBoost.M1 algorithm.
- DecisionTree: Added findTerminalNode methods.
- CrossFoldCreator: Added constructor that takes just a number of folds.
- KernelBinaryCategorizer is now an interface with a default implementation in DefaultKernelBinaryCategorizer. KernelPerceptron, KernelAdatron, SequentialMinimalOptimization, and SuccessiveOverrelaxation now all use the new interface or default class.
- Added LinearMultiCategorizer class, which keeps a LinearBinaryCategorizer for each class and
- Removed GeneralizedScalarRadialBasisKernel class.
- GaussianContextRecognizer: Now requires MixtureOfGaussians.PDF.
- DefaultConfusionMatrix: Added copy constructor.
- DiscreteDistribution: Added getDomainSize and implemented in subclasses.
- DiscreteSamplingUtil: Added sampleWithReplacementInto method.
- ProbabilityMassFunctionUtil: Added sampleMultiple and sampleSingle.
- ScalarProbabilityDensityFunction: Added logEvaluate method and implemented in subclasses.
- BayesianLinearRegression: Added incremental learner.
- BayesianRobustLinearRegression: Added incremental learner.
- Refactored LinearMixtureModel, MixtureOfGaussians, and ScalarMixtureDensityModel to be more consistent with other Foundry statistics classes.
- MultivariateGaussian: Added incremental estimator.
- Added MultivariateMixtureDensityModel class.
- UnivaraiteGaussian: Added incremental estimator.
- Added FriedmanConfidence, NemenyiConfidence, and TukeyRangeConfidence, which implement methods for computing confidence intervals.
- Text Core:
- LatentSemanticAnalysis: Added better handling of low-rank matrices.
May 20, 2011, 00:46:20 3.1.1 - All projects now depend on JUnit 4.8.2 instead of 3.8.2 or 4.6.
- Upgraded to XStream to 1.3.1 (and its dependency xpp3_min to 1.1.4c).
- Common Core:
- KDTree: Fixed two bugs that caused the KDTree to not always return the nearest points in rare cases.
- Learning Core:
- PartitionalClusterer: Added an implementation of a partitional clustering algorithm.
- IncrementalClusterCreator: New interface for a cluster creator that can incrementally update clusters. DefaultIncrementalClusterCreator is a default implementation of the interface that just updates the cluster memberships.
- VectorMeanCentroidClusterCreator: Modified to implement the new IncrementalClusterCreator interface.
- BinaryBaggingLearner: Generalized generics in constructor.
- OnlineBaggingCategorizerLearner: Added an implementation of an online version of the bagging algorithm for building ensembles.
- VotingCategorizerEnsemble: Added an unweighted voting categorization ensemble as a counterpart to WeightedVotingCategorizerEnsemble.
- OnlinePassiveAggressivePerceptron: Added an implementation of the Passive-Aggressive algorithm for binary classification. Also contains PA-I (LinearSoftMargin) and PA-II (QuadraticSoftMargin) variants.
March 24, 2011, 00:39:46 3.1.0 - Text: New package in this release for doing statistical text analysis and information retrieval. Especially useful for doing machine learning over text data.
- Common Core:
- CollectionUtil: Added toStringDelimited method.
- VectorFactoryContainer is now an interface, existing implementation moved to DefaultVectorFactoryContainer.
- ObjectUtil: Added getBytes method.
- Learning Core:
- OnlineLearner interface renamed to IncrementalLearner, which is a more appropriate name. Other similar classes also renamed: AbstractBatchAndIncrementalLearner
- Added BatchLearnerContainer interface and renamed AbstractBatchLearnerWrapper to AbstractBatchLearnerContainer
- Added DiscriminantCategorizer and BinaryDiscriminantCategorizer interfaces. Almost all Categorizers now implement one of these interfaces; some existing methods were renamed to match these standard interfaces.
- Added DistanceSamplingClusterInitializer, which is an implementation of the k-means++ initialization algorithm.
- HiddenMarkovModel and BaumWelchAlgorithm had some performance improvements.
- Added KernelPrincipalComponentsAnalysis, which is an implementation of kernel PCA using eigen-decomposition.
- DatasetUtil: Added countOutputValues.
- TargetEstimatePair and WeightedTargetEstimatePair are now interfaces; existing implementations moved to DefaultTargetEstimatePair and DefaultWeightedTargetEstimatePair.
- ConstantEvaluator: Now extends AbstractCloneableSerializable.
- ThresholdBinaryCategorizer is now an interface. Existing implementation moved to ScalarThresholdBinaryCategorizer.
- AbstractThresholdBinaryCategorizer now longer contains the threshold directly, instead it assumes the threshold is applied in evaluateAsDouble and categorizes based on zero. This is to remove the confusion of having both a bias and a threshold in some classes and to clarify the meaning of evaluateAsDouble so that it matches the new BinaryDiscriminantCategorizer interface.
- DataHistogram: Added isEmpty method and implemented in AbstractDataHistogram.
- Made new ConfusionMatrix and BinaryConfusionMatrix interfaces and put them in the gov.sandia.cognition.learning.performance.categorization. Refactored existing confusion matrix code to implement these interfaces; the existing class ending up in DefaultBinaryConfusionMatrix. New confusion matrix interface uses generics to be useful for more than just binary categorization.
- Now uses JUnit 4.6 for tests.
February 24, 2011, 02:41:06 3.0.3 Initial Announcement on mloss.org.
December 23, 2010, 06:33:30 -
General:
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.