Project details for Jstacs

Logo JMLR Jstacs 2.0

by keili - July 30, 2012, 13:31:02 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions

Description:

Sequence analysis is one of the major subjects of bioinformatics. Several existing libraries combine the representation of biological sequences with exact and approximate pattern matching as well as alignment algorithms. We present Jstacs, an open source Java library, which focuses on the statistical analysis of biological sequences instead. Jstacs comprises an efficient representation of sequence data and provides implementations of many statistical models with generative and discriminative approaches for parameter learning. Using Jstacs, classifiers can be assessed and compared on test datasets or by cross-validation experiments evaluating several performance measures. Due to its strictly object-oriented design Jstacs is easy to use and readily extensible.

Changes to previous version:

February 2, 2012: Jstacs 2.0 released

Jstacs 2.0 changes many names and the structure of several packages. It is not code-compatible with Jstacs 1.5 and earlier

RESTRUCTURING and RENAMING:

former ScoringFunction, NormalizableScoringFunction, Model

  • new base-interface SequenceScore
  • new sub-interface StatisticalModel of SequenceScore for all statistical models with sub-iterfaces DifferentiableStatisticalModel and TrainableStatisticalModel
  • new interface DifferentiableSequenceScore replaces ScoringFunction
  • new interface DifferentiableStatisticalModel replaces NormalizableScoringFunction
  • new interface TrainableStatisticalModel replaces Model
  • new abstract class AbstractDifferentiableSequenceScore
  • new abstract class AbstractDifferentiableStatisticalModel replaces AbstractNormalizableScoringFunction
  • new abstract class AbstractTrainableStatisticalModel replaces AbstractModel
  • former Models renamed to TrainSM
  • former ScoringFunction renamed to DiffSS or DiffSM
  • getProbFor removed from TrainableStatisticalModel (former Model) and conceptually replaced by getLogProbFor
  • getLogScore(Sequence,int,int) with changed meaning of arguments: getLogScore(Sequence,start,end) instead of getLogScore(Sequence,start,length)
  • isTrained() replaced by common method isInitialized()

Parameters and Results

  • new super-class of Parameters and Results: AnnotatedEntity
  • common list-type for Parameters and Results: AnnotatedEntityList
  • Renaming: CollectionParameter -> SelectionParameter, MultiSelectionCollectionParameter -> MultiSelectionParameter, new super-class AbstractSelectionParameter
  • major refactoring due to common hierarchy and code-cleanup
  • lazy evaluation of Parameter/ParameterSet hierarchies moved from ParameterSet (loadParameters()) to ParameterSetContainer (constructor on class)
  • SubclassFinder adapted to lazy evaluation

performance measures

  • new abstract super-class AbstractPerformanceMeasure of all performance measures
  • new interface NumericalPerformanceMeasure for all performance measures that return a single number (as opposed, e.g., to curves)
  • new class PerformanceMeasureParameterSet for a collection of general performance measures
  • new class NumericalPerformanceMeasureParameterSet for a collection of NumericalPerformanceMeasures
  • used in evaluate-method of AbstractClassifier and in ClassifierAssessments

further changes

  • Sample renamed to DataSet
  • evaluate and evaluateAll in AbstractClassifier joined
  • new class IndependentProductDiffSS as super-class of IndepedentProductDiffSM (former IndependentProductScoringFunction)
  • new class UniformDiffSS as super-class of UniformDiffSM (former UniformScoringFunction)

NEW FUNCTIONALITY:

  • multi-threaded implementation of Baum-Welch and Viterbi training of hidden Markov models
  • new Interface Singleton that can be used for singleton instances to save memory, current examples: DNAAlphabet, DNAAlphabetContainer, ProteinAlphabet
  • added ProteinAlphabet
  • added possibility to use NaN-values with ContinuousAlphabets
  • added ArbitraryFloatSequence including static methods for DataSet creation for cases where double-precision is not needed
  • new performance measure MaximumFMeasure
  • access to Parameters in ParameterSets and Results in ResultSets by name
  • emitDataSet in BayesianNetworkDiffSM
  • new static method Time.getTimeInstance that returns UserTime or RealTime depending on availability of shared lib
  • SubclassFinder allows for adding own base packages
  • new method overlaps() in LocatedSequenceAnnotationWithLength
  • AbstractTerminationCondition used in ScoreClassifier and sub-classes
  • public method propagateESS in HMMFactory
  • new method generateLog in DirichletMRG for drawing log-values
  • added DifferentiableStatisticalModelFactory

BUGFIXES/IMPROVEMENTS:

  • bugfix in propagation of equivalent sample size in HMMFactory
  • bugfix in random initialization of BasicHigherOrderTransition
  • improved Alignment implementation
  • SafeOutputStream with new static factory method getSafeOutputStream, write methods now work on Objects

DOCUMENTATION:

  • improved Javadocs in many classes and packages
  • new Cookbook with extensive documentation and explanation

MISC:

  • output of NonParsableException more verbose
  • Exceptions in multi-threaded code now lead to exit of program instead of only stopping the thread
  • update of RServe/RClient
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Cygwin, Linux, Macosx, Windows, Unix, Agnostic, Solaris, Freebsd, Platform Independent
Data Formats: Plain Ascii, Fasta
Tags: Bioinformatics, R, Classification, Machine Learning, Bayesian Networks, Markov Random Fields, Supervised Learning, Em, Mixture Models, Java, Learning Principles, Probabilistic Models, Motif Discovery
Archive: download here

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.