Projects that are tagged with bioinformatics.


Logo JMLR SHOGUN 2.1.0

by sonne - March 17, 2013, 13:59:34 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 41048 views, 8596 downloads, 4 subscriptions

Rating Whole StarWhole StarWhole StarWhole StarEmpty Star
(based on 5 votes)

About: The SHOGUN machine learning toolbox's focus is on large scale learning methods with focus on Support Vector Machines (SVM), providing interfaces to python, octave, matlab, r and the command line.

Changes:

This release also contains several enhancements, cleanups and bugfixes:

Features

  • Linear Time MMD two-sample test now works on streaming-features, which allows to perform tests on infinite amounts of data. A block size may be specified for fast processing. The below features were also added. By Heiko Strathmann.
  • It is now possible to ask streaming features to produce an instance of streamed features that are stored in memory and returned as a CFeatures* object of corresponding type. See CStreamingFeatures::get_streamed_features().
  • New concept of artificial data generator classes: Based on streaming features. First implemented instances are CMeanShiftDataGenerator and CGaussianBlobsDataGenerator. Use above new concepts to get non-streaming data if desired.
  • Accelerated projected gradient multiclass logistic regression classifier by Sergey Lisitsyn.
  • New CCSOSVM based structured output solver by Viktor Gal
  • A collection of kernel selection methods for MMD-based kernel two- sample tests, including optimal kernel choice for single and combined kernels for the linear time MMD. This finishes the kernel MMD framework and also comes with new, more illustrative examples and tests. By Heiko Strathmann.
  • Alpha version of Perl modular interface developed by Christian Montanari.
  • New framework for unit-tests based on googletest and googlemock by Viktor Gal. A (growing) number of unit-tests from now on ensures basic funcionality of our framework. Since the examples do not have to take this role anymore, they should become more ilustrative in the future.
  • Changed the core of dimension reduction algorithms to the Tapkee library.

Bugfixes

  • Fix for shallow copy of gaussian kernel by Matt Aasted.
  • Fixed a bug when using StringFeatures along with kernel machines in cross-validation which cause an assertion error. Thanks to Eric (yoo)!
  • Fix for 3-class case training of MulticlassLibSVM reported by Arya Iranmehr that was suggested by Oksana Bayda.
  • Fix for wrong Spectrum mismatch RBF construction in static interfaces reported by Nona Kermani.
  • Fix for wrong include in SGMatrix causing build fail on Mac OS X (thanks to @bianjiang).
  • Fixed a bug that caused kernel machines to return non-sense when using custom kernel matrices with subsets attached to them.
  • Fix for parameter dictionary creationg causing dereferencing null pointers with gaussian processes parameter selection.
  • Fixed a bug in exact GP regression that caused wrong results.
  • Fixed a bug in exact GP regression that produced memory errors/crashes.
  • Fix for a bug with static interfaces causing all outputs to be -1/+1 instead of real scores (reported by Kamikawa Masahisa).

Cleanup and API Changes

  • SGStringList is now based on SGReferencedData.
  • "confidences" in context of CLabel and subclasses are now "values".
  • CLinearTimeMMD constructor changes, only streaming features allowed.
  • CDataGenerator will soon be removed and replaced by new streaming- based classes.
  • SGVector, SGMatrix, SGSparseVector, SGSparseVector, SGSparseMatrix refactoring: Now contains load/save routines, relevant functions from CMath, and implementations went to .cpp file.

Logo FABIA 2.4.0

by hochreit - December 20, 2012, 14:20:58 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 5361 views, 1062 downloads, 1 subscription

Rating Whole StarWhole StarWhole StarWhole Star1/2 Star
(based on 1 vote)

About: FABIA is a biclustering algorithm that clusters rows and columns of a matrix simultaneously. Consequently, members of a row cluster are similar to each other on a subset of columns and, analogously, members of a column cluster are similar to each other on a subset of rows. Biclusters are found by factor analysis where both the factors and the loading matrix are sparse. FABIA is a multiplicative model that extracts linear dependencies between samples and feature patterns. Applications include detection of transcriptional modules in gene expression data and identification of haplotypes/>identity by descent< consisting of rare variants obtained by next generation sequencing.

Changes:

CHANGES IN VERSION 2.4.0

o spfabia bugfixes

CHANGES IN VERSION 2.3.1

NEW FEATURES

o Getters and setters for class Factorization

2.0.0:

  • spfabia: fabia for a sparse data matrix (in sparse matrix format) and sparse vector/matrix computations in the code to speed up computations. spfabia applications: (a) detecting >identity by descent< in next generation sequencing data with rare variants, (b) detecting >shared haplotypes< in disease studies based on next generation sequencing data with rare variants;
  • fabia for non-negative factorization (parameter: non_negative);
  • changed to C and removed dependencies to Rcpp;
  • improved update for lambda (alpha should be smaller, e.g. 0.03);
  • introduced maximal number of row elements (lL);
  • introduced cycle bL when upper bounds nL or lL are effective;
  • reduced computational complexity;
  • bug fixes: (a) update formula for lambda: tighter approximation, (b) corrected inverse of the conditional covariance matrix of z;

1.4.0:

  • New option nL: maximal number of biclusters per row element;
  • Sort biclusters according to information content;
  • Improved and extended preprocessing;
  • Update to R2.13

Logo JMLR Jstacs 2.0

by keili - July 30, 2012, 13:31:02 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 9522 views, 2094 downloads, 2 subscriptions

About: A Java framework for statistical analysis and classification of biological sequences

Changes:

February 2, 2012: Jstacs 2.0 released

Jstacs 2.0 changes many names and the structure of several packages. It is not code-compatible with Jstacs 1.5 and earlier

RESTRUCTURING and RENAMING:

former ScoringFunction, NormalizableScoringFunction, Model

  • new base-interface SequenceScore
  • new sub-interface StatisticalModel of SequenceScore for all statistical models with sub-iterfaces DifferentiableStatisticalModel and TrainableStatisticalModel
  • new interface DifferentiableSequenceScore replaces ScoringFunction
  • new interface DifferentiableStatisticalModel replaces NormalizableScoringFunction
  • new interface TrainableStatisticalModel replaces Model
  • new abstract class AbstractDifferentiableSequenceScore
  • new abstract class AbstractDifferentiableStatisticalModel replaces AbstractNormalizableScoringFunction
  • new abstract class AbstractTrainableStatisticalModel replaces AbstractModel
  • former Models renamed to TrainSM
  • former ScoringFunction renamed to DiffSS or DiffSM
  • getProbFor removed from TrainableStatisticalModel (former Model) and conceptually replaced by getLogProbFor
  • getLogScore(Sequence,int,int) with changed meaning of arguments: getLogScore(Sequence,start,end) instead of getLogScore(Sequence,start,length)
  • isTrained() replaced by common method isInitialized()

Parameters and Results

  • new super-class of Parameters and Results: AnnotatedEntity
  • common list-type for Parameters and Results: AnnotatedEntityList
  • Renaming: CollectionParameter -> SelectionParameter, MultiSelectionCollectionParameter -> MultiSelectionParameter, new super-class AbstractSelectionParameter
  • major refactoring due to common hierarchy and code-cleanup
  • lazy evaluation of Parameter/ParameterSet hierarchies moved from ParameterSet (loadParameters()) to ParameterSetContainer (constructor on class)
  • SubclassFinder adapted to lazy evaluation

performance measures

  • new abstract super-class AbstractPerformanceMeasure of all performance measures
  • new interface NumericalPerformanceMeasure for all performance measures that return a single number (as opposed, e.g., to curves)
  • new class PerformanceMeasureParameterSet for a collection of general performance measures
  • new class NumericalPerformanceMeasureParameterSet for a collection of NumericalPerformanceMeasures
  • used in evaluate-method of AbstractClassifier and in ClassifierAssessments

further changes

  • Sample renamed to DataSet
  • evaluate and evaluateAll in AbstractClassifier joined
  • new class IndependentProductDiffSS as super-class of IndepedentProductDiffSM (former IndependentProductScoringFunction)
  • new class UniformDiffSS as super-class of UniformDiffSM (former UniformScoringFunction)

NEW FUNCTIONALITY:

  • multi-threaded implementation of Baum-Welch and Viterbi training of hidden Markov models
  • new Interface Singleton that can be used for singleton instances to save memory, current examples: DNAAlphabet, DNAAlphabetContainer, ProteinAlphabet
  • added ProteinAlphabet
  • added possibility to use NaN-values with ContinuousAlphabets
  • added ArbitraryFloatSequence including static methods for DataSet creation for cases where double-precision is not needed
  • new performance measure MaximumFMeasure
  • access to Parameters in ParameterSets and Results in ResultSets by name
  • emitDataSet in BayesianNetworkDiffSM
  • new static method Time.getTimeInstance that returns UserTime or RealTime depending on availability of shared lib
  • SubclassFinder allows for adding own base packages
  • new method overlaps() in LocatedSequenceAnnotationWithLength
  • AbstractTerminationCondition used in ScoreClassifier and sub-classes
  • public method propagateESS in HMMFactory
  • new method generateLog in DirichletMRG for drawing log-values
  • added DifferentiableStatisticalModelFactory

BUGFIXES/IMPROVEMENTS:

  • bugfix in propagation of equivalent sample size in HMMFactory
  • bugfix in random initialization of BasicHigherOrderTransition
  • improved Alignment implementation
  • SafeOutputStream with new static factory method getSafeOutputStream, write methods now work on Objects

DOCUMENTATION:

  • improved Javadocs in many classes and packages
  • new Cookbook with extensive documentation and explanation

MISC:

  • output of NonParsableException more verbose
  • Exceptions in multi-threaded code now lead to exit of program instead of only stopping the thread
  • update of RServe/RClient

Logo BCILAB 1.0-beta

by chkothe - January 6, 2012, 23:47:55 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1740 views, 333 downloads, 1 subscription

About: MATLAB toolbox for advanced Brain-Computer Interface (BCI) research.

Changes:

Initial Announcement on mloss.org.


Logo NetPro 1.1.17

by lml - January 25, 2011, 19:02:53 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2298 views, 532 downloads, 1 subscription

About: Tools for functional network analysis.

Changes:

Initial Announcement on mloss.org.


Logo Epistatic MAP Imputation 1.1

by colm - November 25, 2010, 21:01:10 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 1765 views, 445 downloads, 1 subscription

About: Epistatic miniarray profiles (E-MAPs) are a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. This project contains nearest neighbor based tools for the imputation and prediction of these missing values. The code is implemented in Python and uses a nearest neighbor based approach. Two variants are used - a simple weighted nearest neighbors, and a local least squares based regression.

Changes:

Initial Announcement on mloss.org.


Logo LSTM for biological sequence analysis 1.0

by mhex - July 28, 2010, 16:32:29 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 3736 views, 818 downloads, 1 subscription

Rating Whole StarWhole StarWhole StarWhole StarWhole Star
(based on 1 vote)

About: Implementation of LSTM for biological sequence analysis (classification, regression, motif discovery, remote homology detection). Additionally a LSTM as logistic regression with spectrum kernel is included.

Changes:

Spectrum LSTM package included


Logo asp 0.3

by sonne - May 7, 2010, 10:25:39 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 5978 views, 1113 downloads, 1 subscription

About: Accurate splice site predictor for a variety of genomes.

Changes:

Asp now supports three formats:

-g fname for gff format

-s fname for spf format

-b dir for a binary format compatible with mGene.

And a new switch

-t which switches on a sigmoid-based transformation of the svm scores to get scores between 0 and 1.


Logo Dependency modeling toolbox 0.2

by lml - April 30, 2010, 14:38:45 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 5527 views, 781 downloads, 1 subscription

About: Investigation of dependencies between multiple data sources allows the discovery of regularities and interactions that are not seen in individual data sets. The demand for such methods is increasing with the availability and size of co-occurring observations in computational biology, open data initiatives, and in other domains. We provide practical, open access implementations of general-purpose algorithms that help to realize the full potential of these information sources.

Changes:

Three independent modules (drCCA, pint, MultiWayCCA) have been added.


Logo svmPRAT 1.0

by rangwala - December 28, 2009, 00:27:03 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 2922 views, 668 downloads, 1 subscription

About: BACKGROUND:Over the last decade several prediction methods have been developed for determining the structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. RESULTS:We present a general purpose protein residue annotation toolkit (svmPRAT) to allow biologists to formulate residue-wise prediction problems. svmPRAT formulates the annotation problem as a classification or regression problem using support vector machines. One of the key features of svmPRAT is its ease of use in incorporating any user-provided information in the form of feature matrices. For every residue svmPRAT captures local information around the reside to create fixed length feature vectors. svmPRAT implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that accurately captures signals and pattern for training eective predictive models. CONCLUSIONS:In this work we evaluate svmPRAT on several classification and regression problems including disorder prediction, residue-wise contact order estimation, DNA-binding site prediction, and local structure alphabet prediction. svmPRAT has also been used for the development of state-of-the-art transmembrane helix prediction method called TOPTMH, and secondary structure prediction method called YASSPP. This toolkit developed provides practitioners an efficient and easy-to-use tool for a wide variety of annotation problems. Availability: http://www.cs.gmu.edu/~mlbio/svmprat/

Changes:

Initial Announcement on mloss.org.


Logo seqan 1.2

by sonne - November 2, 2009, 14:54:08 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 4558 views, 881 downloads, 1 subscription

About: SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.

Changes:
  • 5 more applications, i.e. DFI, MicroRazerS, PairAlign, SeqCons, TreeRecon
  • stable release of RazerS supporting paired-end read mapping and configurable sensitivity
  • new alignment algorithms, e.g. banded, configurable alignments (overlap, semi-global, ...)
  • realignment algorithm
  • NGS data structures and formats, e.g. SAM, Amos, ...
  • new alphabets, e.g. Dna with base call qualities, profile characters
  • auxiliary data structures and algorithms, e.g. double ended queue, command line parser
  • positional scores
  • CMake support

Logo Easysvm 0.3

by gxr - June 25, 2009, 18:33:04 CET [ Project Homepage BibTeX Download ] 7044 views, 1274 downloads, 1 subscription

About: The Easysvm package provides a set of tools based on the Shogun toolbox allowing to train and test SVMs in a simple way.

Changes:

Fixes for shogun 0.7.3.


Logo arts 0.2

by sonne - May 25, 2009, 09:56:31 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 3419 views, 702 downloads, 1 subscription

About: ARTS is an accurate predictor for Transcription Start Sites (TSS).

Changes:

Initial Announcement on mloss.org.


Logo Spike train feature extraction by Bayesian binning 0.1

by dendres - September 24, 2008, 16:19:14 CET [ BibTeX BibTeX for corresponding Paper Download ] 4810 views, 1099 downloads, 0 comments, 1 subscription

About: *binsdfc* is a command line implementation of the algorithm described in [Endres,Oram,Schindelin,Foldiak:*Bayesian binning beats approximate alternatives: estimating peri-stimulus time histograms*, [...]

Changes:

Changed build system from automake to cmake. Moved download page to www.compsens.uni-tuebingen.de


Logo Nested Effects Models 2.4.0

by florian - July 8, 2008, 00:05:59 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 4270 views, 1105 downloads, 1 subscription

About: Nested Effects Models (NEMs) are a class of directed graphical models originally introduced to analyze the effects of gene perturbation screens with high-dimensional phenotypes. In contrast to other [...]

Changes:

Initial Announcement on mloss.org.


Logo Sleipnir 1.0

by chuttenh - June 30, 2008, 03:22:19 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 4842 views, 979 downloads, 1 subscription

About: The Sleipnir C++ library implements a variety of machine learning and data manipulation algorithms focusing on heterogeneous data integration and efficiency for large biological data collections.

Changes:

Initial Announcement on mloss.org.


Logo mSplicer 0.3

by sonne - May 18, 2008, 13:07:40 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ] 4738 views, 946 downloads, 3 subscriptions

Rating Whole StarWhole StarWhole StarWhole StarEmpty Star
(based on 2 votes)

About: For modern biology, precise genome annotations are of prime importance as they allow the accurate definition of genic regions. We employ state of the art machine learning methods to assay and [...]

Changes:

Initial Announcement on mloss.org.


Logo Local Alignment Kernels 0.3.2

by hiroto - December 1, 2007, 00:10:23 CET [ Project Homepage BibTeX Download ] 4440 views, 1011 downloads, 0 subscriptions

About: Local alignment kernels measure the similarity between two sequences by summing up scores obtained from local alignments with gaps of the sequences.

Changes:

Initial Announcement on mloss.org.


About: PALMA computes the optimal spliced alignment of a mRNA sequence to a genomic sequence. The main python script takes two FASTA files containing the target (e.g. a DNA sequence, part of the genome) [...]

Changes:

Initial Announcement on mloss.org.