Projects supporting the fasta data format.


Logo JMLR SHOGUN 2.1.0

by sonne - March 17, 2013, 13:59:34 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 41163 views, 8616 downloads, 4 subscriptions

Rating Whole StarWhole StarWhole StarWhole StarEmpty Star
(based on 5 votes)

About: The SHOGUN machine learning toolbox's focus is on large scale learning methods with focus on Support Vector Machines (SVM), providing interfaces to python, octave, matlab, r and the command line.

Changes:

This release also contains several enhancements, cleanups and bugfixes:

Features

  • Linear Time MMD two-sample test now works on streaming-features, which allows to perform tests on infinite amounts of data. A block size may be specified for fast processing. The below features were also added. By Heiko Strathmann.
  • It is now possible to ask streaming features to produce an instance of streamed features that are stored in memory and returned as a CFeatures* object of corresponding type. See CStreamingFeatures::get_streamed_features().
  • New concept of artificial data generator classes: Based on streaming features. First implemented instances are CMeanShiftDataGenerator and CGaussianBlobsDataGenerator. Use above new concepts to get non-streaming data if desired.
  • Accelerated projected gradient multiclass logistic regression classifier by Sergey Lisitsyn.
  • New CCSOSVM based structured output solver by Viktor Gal
  • A collection of kernel selection methods for MMD-based kernel two- sample tests, including optimal kernel choice for single and combined kernels for the linear time MMD. This finishes the kernel MMD framework and also comes with new, more illustrative examples and tests. By Heiko Strathmann.
  • Alpha version of Perl modular interface developed by Christian Montanari.
  • New framework for unit-tests based on googletest and googlemock by Viktor Gal. A (growing) number of unit-tests from now on ensures basic funcionality of our framework. Since the examples do not have to take this role anymore, they should become more ilustrative in the future.
  • Changed the core of dimension reduction algorithms to the Tapkee library.

Bugfixes

  • Fix for shallow copy of gaussian kernel by Matt Aasted.
  • Fixed a bug when using StringFeatures along with kernel machines in cross-validation which cause an assertion error. Thanks to Eric (yoo)!
  • Fix for 3-class case training of MulticlassLibSVM reported by Arya Iranmehr that was suggested by Oksana Bayda.
  • Fix for wrong Spectrum mismatch RBF construction in static interfaces reported by Nona Kermani.
  • Fix for wrong include in SGMatrix causing build fail on Mac OS X (thanks to @bianjiang).
  • Fixed a bug that caused kernel machines to return non-sense when using custom kernel matrices with subsets attached to them.
  • Fix for parameter dictionary creationg causing dereferencing null pointers with gaussian processes parameter selection.
  • Fixed a bug in exact GP regression that caused wrong results.
  • Fixed a bug in exact GP regression that produced memory errors/crashes.
  • Fix for a bug with static interfaces causing all outputs to be -1/+1 instead of real scores (reported by Kamikawa Masahisa).

Cleanup and API Changes

  • SGStringList is now based on SGReferencedData.
  • "confidences" in context of CLabel and subclasses are now "values".
  • CLinearTimeMMD constructor changes, only streaming features allowed.
  • CDataGenerator will soon be removed and replaced by new streaming- based classes.
  • SGVector, SGMatrix, SGSparseVector, SGSparseVector, SGSparseMatrix refactoring: Now contains load/save routines, relevant functions from CMath, and implementations went to .cpp file.

Logo JMLR Sally 0.8.1

by konrad - December 27, 2012, 14:34:31 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 11896 views, 2253 downloads, 2 subscriptions

About: A Tool for Embedding Strings in Vector Spaces

Changes:

Support for positional n-grams with shift (similar to weighted-degree kernel with shift) has been added. Several minor bugs have been fixed.


Logo JMLR Jstacs 2.0

by keili - July 30, 2012, 13:31:02 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 9559 views, 2098 downloads, 2 subscriptions

About: A Java framework for statistical analysis and classification of biological sequences

Changes:

February 2, 2012: Jstacs 2.0 released

Jstacs 2.0 changes many names and the structure of several packages. It is not code-compatible with Jstacs 1.5 and earlier

RESTRUCTURING and RENAMING:

former ScoringFunction, NormalizableScoringFunction, Model

  • new base-interface SequenceScore
  • new sub-interface StatisticalModel of SequenceScore for all statistical models with sub-iterfaces DifferentiableStatisticalModel and TrainableStatisticalModel
  • new interface DifferentiableSequenceScore replaces ScoringFunction
  • new interface DifferentiableStatisticalModel replaces NormalizableScoringFunction
  • new interface TrainableStatisticalModel replaces Model
  • new abstract class AbstractDifferentiableSequenceScore
  • new abstract class AbstractDifferentiableStatisticalModel replaces AbstractNormalizableScoringFunction
  • new abstract class AbstractTrainableStatisticalModel replaces AbstractModel
  • former Models renamed to TrainSM
  • former ScoringFunction renamed to DiffSS or DiffSM
  • getProbFor removed from TrainableStatisticalModel (former Model) and conceptually replaced by getLogProbFor
  • getLogScore(Sequence,int,int) with changed meaning of arguments: getLogScore(Sequence,start,end) instead of getLogScore(Sequence,start,length)
  • isTrained() replaced by common method isInitialized()

Parameters and Results

  • new super-class of Parameters and Results: AnnotatedEntity
  • common list-type for Parameters and Results: AnnotatedEntityList
  • Renaming: CollectionParameter -> SelectionParameter, MultiSelectionCollectionParameter -> MultiSelectionParameter, new super-class AbstractSelectionParameter
  • major refactoring due to common hierarchy and code-cleanup
  • lazy evaluation of Parameter/ParameterSet hierarchies moved from ParameterSet (loadParameters()) to ParameterSetContainer (constructor on class)
  • SubclassFinder adapted to lazy evaluation

performance measures

  • new abstract super-class AbstractPerformanceMeasure of all performance measures
  • new interface NumericalPerformanceMeasure for all performance measures that return a single number (as opposed, e.g., to curves)
  • new class PerformanceMeasureParameterSet for a collection of general performance measures
  • new class NumericalPerformanceMeasureParameterSet for a collection of NumericalPerformanceMeasures
  • used in evaluate-method of AbstractClassifier and in ClassifierAssessments

further changes

  • Sample renamed to DataSet
  • evaluate and evaluateAll in AbstractClassifier joined
  • new class IndependentProductDiffSS as super-class of IndepedentProductDiffSM (former IndependentProductScoringFunction)
  • new class UniformDiffSS as super-class of UniformDiffSM (former UniformScoringFunction)

NEW FUNCTIONALITY:

  • multi-threaded implementation of Baum-Welch and Viterbi training of hidden Markov models
  • new Interface Singleton that can be used for singleton instances to save memory, current examples: DNAAlphabet, DNAAlphabetContainer, ProteinAlphabet
  • added ProteinAlphabet
  • added possibility to use NaN-values with ContinuousAlphabets
  • added ArbitraryFloatSequence including static methods for DataSet creation for cases where double-precision is not needed
  • new performance measure MaximumFMeasure
  • access to Parameters in ParameterSets and Results in ResultSets by name
  • emitDataSet in BayesianNetworkDiffSM
  • new static method Time.getTimeInstance that returns UserTime or RealTime depending on availability of shared lib
  • SubclassFinder allows for adding own base packages
  • new method overlaps() in LocatedSequenceAnnotationWithLength
  • AbstractTerminationCondition used in ScoreClassifier and sub-classes
  • public method propagateESS in HMMFactory
  • new method generateLog in DirichletMRG for drawing log-values
  • added DifferentiableStatisticalModelFactory

BUGFIXES/IMPROVEMENTS:

  • bugfix in propagation of equivalent sample size in HMMFactory
  • bugfix in random initialization of BasicHigherOrderTransition
  • improved Alignment implementation
  • SafeOutputStream with new static factory method getSafeOutputStream, write methods now work on Objects

DOCUMENTATION:

  • improved Javadocs in many classes and packages
  • new Cookbook with extensive documentation and explanation

MISC:

  • output of NonParsableException more verbose
  • Exceptions in multi-threaded code now lead to exit of program instead of only stopping the thread
  • update of RServe/RClient

Logo LSTM for biological sequence analysis 1.0

by mhex - July 28, 2010, 16:32:29 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 3756 views, 823 downloads, 1 subscription

Rating Whole StarWhole StarWhole StarWhole StarWhole Star
(based on 1 vote)

About: Implementation of LSTM for biological sequence analysis (classification, regression, motif discovery, remote homology detection). Additionally a LSTM as logistic regression with spectrum kernel is included.

Changes:

Spectrum LSTM package included


Logo asp 0.3

by sonne - May 7, 2010, 10:25:39 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 5992 views, 1114 downloads, 1 subscription

About: Accurate splice site predictor for a variety of genomes.

Changes:

Asp now supports three formats:

-g fname for gff format

-s fname for spf format

-b dir for a binary format compatible with mGene.

And a new switch

-t which switches on a sigmoid-based transformation of the svm scores to get scores between 0 and 1.


Logo arts 0.2

by sonne - May 25, 2009, 09:56:31 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 3424 views, 703 downloads, 1 subscription

About: ARTS is an accurate predictor for Transcription Start Sites (TSS).

Changes:

Initial Announcement on mloss.org.