-
- Description:
Sequence analysis is one of the major subjects of bioinformatics. Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as alignment algorithms. We present Jstacs, an open source Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches for parameter learning. Using Jstacs, classifiers can be assessed and compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented design Jstacs is easy to use and readily extensible.
- Changes to previous version:
New classes and packages:
- Jstacs 2.3 is the first release to be accompanied by JstacsFX, a library for building JavaFX-based graphical user interfaces based on JstacsTools
- new interface MultiThreadedFunction
- new class LargeSequenceReader for reading large sequence files in chunks
- new interface QuickScanningSequenceScore
- new class RegExpValidator for checking String inputs against a regular expression
- new class IUPACDNAAlphabet
New features and improvements:
- Alignments may now handle different costs for insert and delete gaps
- ListResults may now be constructed from Collections of ResultSets
- Several minor improvements and bugfixes in many classes
- Improvements of documentation of several classes
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- Supported Operating Systems: Cygwin, Linux, Macosx, Windows, Unix, Agnostic, Solaris, Freebsd, Platform Independent
- Data Formats: Plain Ascii, Fasta
- Tags: Bioinformatics, R, Classification, Machine Learning, Bayesian Networks, Markov Random Fields, Supervised Learning, Em, Mixture Models, Java, Learning Principles, Probabilistic Models, Motif Discovery
- Archive: download here
Other available revisons
-
Version Changelog Date 2.3 New classes and packages:
- Jstacs 2.3 is the first release to be accompanied by JstacsFX, a library for building JavaFX-based graphical user interfaces based on JstacsTools
- new interface MultiThreadedFunction
- new class LargeSequenceReader for reading large sequence files in chunks
- new interface QuickScanningSequenceScore
- new class RegExpValidator for checking String inputs against a regular expression
- new class IUPACDNAAlphabet
New features and improvements:
- Alignments may now handle different costs for insert and delete gaps
- ListResults may now be constructed from Collections of ResultSets
- Several minor improvements and bugfixes in many classes
- Improvements of documentation of several classes
September 13, 2017, 14:25:38 2.2 New classes and packages:
- CorreationCoefficient: PerformanceMeasure
- de.jstacs.clustering: package with classes for hierarchical clustering
- DeBruijnGraphSequenceGenerator and DeBruijnSequenceGenerator for generating De Buijn sequences
- CyclicSequenceAdaptor for representing cyclic sequences
- PlotGeneratorResult for representing results that plot images to a Graphics2D object
- TextResult for results that may be stored as text files
- package de.jstacs.results.savers for generic classes that store results to disk
- LimitedSparseLocalInhomogeneousMixtureDiffSM_higherOrder for sparse local inhomogeneous mixture (Slim) models
- PFMWrapperTrainSM for representing position frequency matrices and position weight matrices from databases
- package de.jstacs.tools with classes for generic Jstacs tools that may be used in different user interfaces (command line, Galaxy, JavaFX)
- Compression for ZIP compression of Strings
- package de.jstacs.utils.graphics with generic GraphicsAdaptor using Apache XML commons
- projects: Dimont, GeMoMa, Slim, TALEN, motif comparison
New features and improvements:
- Major restructuring of Alignment for better efficiency
- Alignment Costs and StringAlignment now Storable
- New constructor of DataSet allowing a specified percentage of sequences to mismatch the given alphabet
- BioJavaAdapter ported to BioJava 1.9
- XMLParser now also allows for storing Sequences
- New method for parsing HMMer profile HMMs in HMMFactory
- Several minor improvements and bugfixes in many classes
- Improvements of documentation of several classes
February 17, 2016, 11:57:56 2.1 New classes:
- MultipleIterationsCondition: Requires another TerminationCondition to fail a contiguous, specified number of times
- ClassifierFactory: Allows for creating standard classifiers
- SeqLogoPlotter: Plot PNG sequence logos from within Jstacs
- MultivariateGaussianEmission: Multivariate Gaussian emission density for a Hidden Markov Model
- MEManager: Maximum entropy model
New features and improvements:
- Alignment: Added free shift alignment
- PerformanceMeasure and sub-classes: Extension to weighted test data
- AbstractClassifier, ClassifierAssessment and sub-classes: Adaption to weighted PerformanceMeasures
- DNAAlphabet: Parser speed-up
- PFMComparator: Extension to PFM from other sources/databases
- ToolBox: New convenience methods for computing several statistics (e.g., median, correlation)
- SignificantMotifOccurrencesFinder: New methods for computing PWMs and statistics from predictions
- SequenceScore and sub-classes: New method toString(NumberFormat)
- DataSet: Adaption to weighted data, e.g., partitioning
- REnvironment: Changed several methods from String to CharSequence
Restructuring:
- changed MultiDimensionalSequenceWrapperDiffSM to MultiDimensionalSequenceWrapperDiffSS
Several minor new features, bug fixes, and code cleanups
June 3, 2013, 07:32:55 2.0 February 2, 2012: Jstacs 2.0 released
Jstacs 2.0 changes many names and the structure of several packages. It is not code-compatible with Jstacs 1.5 and earlier
RESTRUCTURING and RENAMING:
former ScoringFunction, NormalizableScoringFunction, Model
- new base-interface SequenceScore
- new sub-interface StatisticalModel of SequenceScore for all statistical models with sub-iterfaces DifferentiableStatisticalModel and TrainableStatisticalModel
- new interface DifferentiableSequenceScore replaces ScoringFunction
- new interface DifferentiableStatisticalModel replaces NormalizableScoringFunction
- new interface TrainableStatisticalModel replaces Model
- new abstract class AbstractDifferentiableSequenceScore
- new abstract class AbstractDifferentiableStatisticalModel replaces AbstractNormalizableScoringFunction
- new abstract class AbstractTrainableStatisticalModel replaces AbstractModel
- former Models renamed to TrainSM
- former ScoringFunction renamed to DiffSS or DiffSM
- getProbFor removed from TrainableStatisticalModel (former Model) and conceptually replaced by getLogProbFor
- getLogScore(Sequence,int,int) with changed meaning of arguments: getLogScore(Sequence,start,end) instead of getLogScore(Sequence,start,length)
- isTrained() replaced by common method isInitialized()
Parameters and Results
- new super-class of Parameters and Results: AnnotatedEntity
- common list-type for Parameters and Results: AnnotatedEntityList
- Renaming: CollectionParameter -> SelectionParameter, MultiSelectionCollectionParameter -> MultiSelectionParameter, new super-class AbstractSelectionParameter
- major refactoring due to common hierarchy and code-cleanup
- lazy evaluation of Parameter/ParameterSet hierarchies moved from ParameterSet (loadParameters()) to ParameterSetContainer (constructor on class)
- SubclassFinder adapted to lazy evaluation
performance measures
- new abstract super-class AbstractPerformanceMeasure of all performance measures
- new interface NumericalPerformanceMeasure for all performance measures that return a single number (as opposed, e.g., to curves)
- new class PerformanceMeasureParameterSet for a collection of general performance measures
- new class NumericalPerformanceMeasureParameterSet for a collection of NumericalPerformanceMeasures
- used in evaluate-method of AbstractClassifier and in ClassifierAssessments
further changes
- Sample renamed to DataSet
- evaluate and evaluateAll in AbstractClassifier joined
- new class IndependentProductDiffSS as super-class of IndepedentProductDiffSM (former IndependentProductScoringFunction)
- new class UniformDiffSS as super-class of UniformDiffSM (former UniformScoringFunction)
NEW FUNCTIONALITY:
- multi-threaded implementation of Baum-Welch and Viterbi training of hidden Markov models
- new Interface Singleton that can be used for singleton instances to save memory, current examples: DNAAlphabet, DNAAlphabetContainer, ProteinAlphabet
- added ProteinAlphabet
- added possibility to use NaN-values with ContinuousAlphabets
- added ArbitraryFloatSequence including static methods for DataSet creation for cases where double-precision is not needed
- new performance measure MaximumFMeasure
- access to Parameters in ParameterSets and Results in ResultSets by name
- emitDataSet in BayesianNetworkDiffSM
- new static method Time.getTimeInstance that returns UserTime or RealTime depending on availability of shared lib
- SubclassFinder allows for adding own base packages
- new method overlaps() in LocatedSequenceAnnotationWithLength
- AbstractTerminationCondition used in ScoreClassifier and sub-classes
- public method propagateESS in HMMFactory
- new method generateLog in DirichletMRG for drawing log-values
- added DifferentiableStatisticalModelFactory
BUGFIXES/IMPROVEMENTS:
- bugfix in propagation of equivalent sample size in HMMFactory
- bugfix in random initialization of BasicHigherOrderTransition
- improved Alignment implementation
- SafeOutputStream with new static factory method getSafeOutputStream, write methods now work on Objects
DOCUMENTATION:
- improved Javadocs in many classes and packages
- new Cookbook with extensive documentation and explanation
MISC:
- output of NonParsableException more verbose
- Exceptions in multi-threaded code now lead to exit of program instead of only stopping the thread
- update of RServe/RClient
February 2, 2012, 17:14:02 1.5 June 1st, 2011: Jstacs 1.5 released
new package de.jstacs.algorithms.alignment for sequence alignment algorithms
new class de.jstacs.models.ModelFactory with static classes to construct many standard models
de.jstacs.utils.galaxy.GalaxyAdaptor, an adaptor to Galaxy, which allows for creating Galaxy applications using Jstacs ParameterSets, also requires new interface GalaxyConvertible
new package de.jstacs.models.hmm for a variety of hidden Markov models, which can be learned by different learning principles including generative and discriminative learning principles, maximization and sampling methods
new package de.jstacs.sampling that contains general infrastructure for parameter sampling
new class de.jstacs.scoringFunctions.MappingScoringFunction that allows for internal mapping of symbols from the alphabet
new package de.jstacs.classifier.scoringFunctionBases.sampling containing classifiers that sample their parameters by the Metropolis-Hastings algorithm
new interface de.jstacs.scoringFunctions.SamplingScoringFunction for NormalizableScoringFunctions that can be used in Metropolis-Hastings sampling of parameters
bugfix in XMLParser for cases, where the tag of interest also occurrs within other, nested tags
June 6, 2011, 10:48:08 1.4 December 31, 2010: Jstacs 1.4 released
added DincleotideProperty for computing properties like melting temperature, twist, or G/C content
support for multidimensional sequence data
more widespread use of TerminationConditions
completely rewritten XMLParser
extension of motif discovery to weighted data
OneSampleLogGenDisMixFunction for using the same Sample with different weights for the different classes
Jstacs requires Java 1.6 now
January 2, 2011, 11:07:10 1.3.1 March 2, 2010: Jstacs 1.3.1 released
Partitioning of Samples including weights
Release of Dispom (de-novo discovery of differentially abundant transcription factor binding sites including their positional preference)
Several bugfixes
March 2, 2010, 14:15:46 1.3 Initial Announcement on mloss.org.
December 4, 2009, 11:27:55
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.