-
- Description:
Sequence analysis is one of the major subjects of bioinformatics. Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as alignment algorithms. We present Jstacs, an open source Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches for parameter learning. Using Jstacs, classifiers can be assessed and compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented design Jstacs is easy to use and readily extensible.
- Changes to previous version:
February 2, 2012: Jstacs 2.0 released
Jstacs 2.0 changes many names and the structure of several packages. It is not code-compatible with Jstacs 1.5 and earlier
RESTRUCTURING and RENAMING:
former ScoringFunction, NormalizableScoringFunction, Model
- new base-interface SequenceScore
- new sub-interface StatisticalModel of SequenceScore for all statistical models with sub-iterfaces DifferentiableStatisticalModel and TrainableStatisticalModel
- new interface DifferentiableSequenceScore replaces ScoringFunction
- new interface DifferentiableStatisticalModel replaces NormalizableScoringFunction
- new interface TrainableStatisticalModel replaces Model
- new abstract class AbstractDifferentiableSequenceScore
- new abstract class AbstractDifferentiableStatisticalModel replaces AbstractNormalizableScoringFunction
- new abstract class AbstractTrainableStatisticalModel replaces AbstractModel
- former Models renamed to TrainSM
- former ScoringFunction renamed to DiffSS or DiffSM
- getProbFor removed from TrainableStatisticalModel (former Model) and conceptually replaced by getLogProbFor
- getLogScore(Sequence,int,int) with changed meaning of arguments: getLogScore(Sequence,start,end) instead of getLogScore(Sequence,start,length)
- isTrained() replaced by common method isInitialized()
Parameters and Results
- new super-class of Parameters and Results: AnnotatedEntity
- common list-type for Parameters and Results: AnnotatedEntityList
- Renaming: CollectionParameter -> SelectionParameter, MultiSelectionCollectionParameter -> MultiSelectionParameter, new super-class AbstractSelectionParameter
- major refactoring due to common hierarchy and code-cleanup
- lazy evaluation of Parameter/ParameterSet hierarchies moved from ParameterSet (loadParameters()) to ParameterSetContainer (constructor on class)
- SubclassFinder adapted to lazy evaluation
performance measures
- new abstract super-class AbstractPerformanceMeasure of all performance measures
- new interface NumericalPerformanceMeasure for all performance measures that return a single number (as opposed, e.g., to curves)
- new class PerformanceMeasureParameterSet for a collection of general performance measures
- new class NumericalPerformanceMeasureParameterSet for a collection of NumericalPerformanceMeasures
- used in evaluate-method of AbstractClassifier and in ClassifierAssessments
further changes
- Sample renamed to DataSet
- evaluate and evaluateAll in AbstractClassifier joined
- new class IndependentProductDiffSS as super-class of IndepedentProductDiffSM (former IndependentProductScoringFunction)
- new class UniformDiffSS as super-class of UniformDiffSM (former UniformScoringFunction)
NEW FUNCTIONALITY:
- multi-threaded implementation of Baum-Welch and Viterbi training of hidden Markov models
- new Interface Singleton that can be used for singleton instances to save memory, current examples: DNAAlphabet, DNAAlphabetContainer, ProteinAlphabet
- added ProteinAlphabet
- added possibility to use NaN-values with ContinuousAlphabets
- added ArbitraryFloatSequence including static methods for DataSet creation for cases where double-precision is not needed
- new performance measure MaximumFMeasure
- access to Parameters in ParameterSets and Results in ResultSets by name
- emitDataSet in BayesianNetworkDiffSM
- new static method Time.getTimeInstance that returns UserTime or RealTime depending on availability of shared lib
- SubclassFinder allows for adding own base packages
- new method overlaps() in LocatedSequenceAnnotationWithLength
- AbstractTerminationCondition used in ScoreClassifier and sub-classes
- public method propagateESS in HMMFactory
- new method generateLog in DirichletMRG for drawing log-values
- added DifferentiableStatisticalModelFactory
BUGFIXES/IMPROVEMENTS:
- bugfix in propagation of equivalent sample size in HMMFactory
- bugfix in random initialization of BasicHigherOrderTransition
- improved Alignment implementation
- SafeOutputStream with new static factory method getSafeOutputStream, write methods now work on Objects
DOCUMENTATION:
- improved Javadocs in many classes and packages
- new Cookbook with extensive documentation and explanation
MISC:
- output of NonParsableException more verbose
- Exceptions in multi-threaded code now lead to exit of program instead of only stopping the thread
- update of RServe/RClient
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- URL: Project Homepage
- JMLR MLOSS PaperURL: JMLR-MLOSS Paper Homepage
- Supported Operating Systems: Cygwin, Linux, Macosx, Windows, Unix, Agnostic, Solaris, Freebsd, Platform Independent
- Data Formats: Plain Ascii, Fasta
- Tags: Bioinformatics, R, Classification, Machine Learning, Bayesian Networks, Markov Random Fields, Supervised Learning, Em, Mixture Models, Java, Learning Principles, Probabilistic Models, Motif Discovery
- Archive: download here
Other available revisons
-
Version Changelog Date 2.0 February 2, 2012: Jstacs 2.0 released
Jstacs 2.0 changes many names and the structure of several packages. It is not code-compatible with Jstacs 1.5 and earlier
RESTRUCTURING and RENAMING:
former ScoringFunction, NormalizableScoringFunction, Model
- new base-interface SequenceScore
- new sub-interface StatisticalModel of SequenceScore for all statistical models with sub-iterfaces DifferentiableStatisticalModel and TrainableStatisticalModel
- new interface DifferentiableSequenceScore replaces ScoringFunction
- new interface DifferentiableStatisticalModel replaces NormalizableScoringFunction
- new interface TrainableStatisticalModel replaces Model
- new abstract class AbstractDifferentiableSequenceScore
- new abstract class AbstractDifferentiableStatisticalModel replaces AbstractNormalizableScoringFunction
- new abstract class AbstractTrainableStatisticalModel replaces AbstractModel
- former Models renamed to TrainSM
- former ScoringFunction renamed to DiffSS or DiffSM
- getProbFor removed from TrainableStatisticalModel (former Model) and conceptually replaced by getLogProbFor
- getLogScore(Sequence,int,int) with changed meaning of arguments: getLogScore(Sequence,start,end) instead of getLogScore(Sequence,start,length)
- isTrained() replaced by common method isInitialized()
Parameters and Results
- new super-class of Parameters and Results: AnnotatedEntity
- common list-type for Parameters and Results: AnnotatedEntityList
- Renaming: CollectionParameter -> SelectionParameter, MultiSelectionCollectionParameter -> MultiSelectionParameter, new super-class AbstractSelectionParameter
- major refactoring due to common hierarchy and code-cleanup
- lazy evaluation of Parameter/ParameterSet hierarchies moved from ParameterSet (loadParameters()) to ParameterSetContainer (constructor on class)
- SubclassFinder adapted to lazy evaluation
performance measures
- new abstract super-class AbstractPerformanceMeasure of all performance measures
- new interface NumericalPerformanceMeasure for all performance measures that return a single number (as opposed, e.g., to curves)
- new class PerformanceMeasureParameterSet for a collection of general performance measures
- new class NumericalPerformanceMeasureParameterSet for a collection of NumericalPerformanceMeasures
- used in evaluate-method of AbstractClassifier and in ClassifierAssessments
further changes
- Sample renamed to DataSet
- evaluate and evaluateAll in AbstractClassifier joined
- new class IndependentProductDiffSS as super-class of IndepedentProductDiffSM (former IndependentProductScoringFunction)
- new class UniformDiffSS as super-class of UniformDiffSM (former UniformScoringFunction)
NEW FUNCTIONALITY:
- multi-threaded implementation of Baum-Welch and Viterbi training of hidden Markov models
- new Interface Singleton that can be used for singleton instances to save memory, current examples: DNAAlphabet, DNAAlphabetContainer, ProteinAlphabet
- added ProteinAlphabet
- added possibility to use NaN-values with ContinuousAlphabets
- added ArbitraryFloatSequence including static methods for DataSet creation for cases where double-precision is not needed
- new performance measure MaximumFMeasure
- access to Parameters in ParameterSets and Results in ResultSets by name
- emitDataSet in BayesianNetworkDiffSM
- new static method Time.getTimeInstance that returns UserTime or RealTime depending on availability of shared lib
- SubclassFinder allows for adding own base packages
- new method overlaps() in LocatedSequenceAnnotationWithLength
- AbstractTerminationCondition used in ScoreClassifier and sub-classes
- public method propagateESS in HMMFactory
- new method generateLog in DirichletMRG for drawing log-values
- added DifferentiableStatisticalModelFactory
BUGFIXES/IMPROVEMENTS:
- bugfix in propagation of equivalent sample size in HMMFactory
- bugfix in random initialization of BasicHigherOrderTransition
- improved Alignment implementation
- SafeOutputStream with new static factory method getSafeOutputStream, write methods now work on Objects
DOCUMENTATION:
- improved Javadocs in many classes and packages
- new Cookbook with extensive documentation and explanation
MISC:
- output of NonParsableException more verbose
- Exceptions in multi-threaded code now lead to exit of program instead of only stopping the thread
- update of RServe/RClient
February 2, 2012, 17:14:02 1.5 June 1st, 2011: Jstacs 1.5 released
new package de.jstacs.algorithms.alignment for sequence alignment algorithms
new class de.jstacs.models.ModelFactory with static classes to construct many standard models
de.jstacs.utils.galaxy.GalaxyAdaptor, an adaptor to Galaxy, which allows for creating Galaxy applications using Jstacs ParameterSets, also requires new interface GalaxyConvertible
new package de.jstacs.models.hmm for a variety of hidden Markov models, which can be learned by different learning principles including generative and discriminative learning principles, maximization and sampling methods
new package de.jstacs.sampling that contains general infrastructure for parameter sampling
new class de.jstacs.scoringFunctions.MappingScoringFunction that allows for internal mapping of symbols from the alphabet
new package de.jstacs.classifier.scoringFunctionBases.sampling containing classifiers that sample their parameters by the Metropolis-Hastings algorithm
new interface de.jstacs.scoringFunctions.SamplingScoringFunction for NormalizableScoringFunctions that can be used in Metropolis-Hastings sampling of parameters
bugfix in XMLParser for cases, where the tag of interest also occurrs within other, nested tags
June 6, 2011, 10:48:08 1.4 December 31, 2010: Jstacs 1.4 released
added DincleotideProperty for computing properties like melting temperature, twist, or G/C content
support for multidimensional sequence data
more widespread use of TerminationConditions
completely rewritten XMLParser
extension of motif discovery to weighted data
OneSampleLogGenDisMixFunction for using the same Sample with different weights for the different classes
Jstacs requires Java 1.6 now
January 2, 2011, 11:07:10 1.3.1 March 2, 2010: Jstacs 1.3.1 released
Partitioning of Samples including weights
Release of Dispom (de-novo discovery of differentially abundant transcription factor binding sites including their positional preference)
Several bugfixes
March 2, 2010, 14:15:46 1.3 Initial Announcement on mloss.org.
December 4, 2009, 11:27:55
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.