Project details for SHOGUN

Screenshot JMLR SHOGUN 4.0.0

by sonne - February 5, 2015, 09:09:37 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (4 today), download ( 0 today ), 4 comments, 0 subscriptions

OverallWhole StarWhole StarWhole StarEmpty StarEmpty Star
FeaturesWhole StarWhole StarWhole Star1/2 StarEmpty Star
UsabilityWhole StarWhole StarWhole StarEmpty StarEmpty Star
DocumentationWhole StarWhole StarWhole StarEmpty StarEmpty Star
(based on 6 votes)
Description:

Overview

The SHOGUN machine learning toolbox's focus is on large scale kernel methods and especially on Support Vector Machines (SVM). It comes with a generic interface for kernel machines and features 15 different SVM implementations that all access features in a unified way via a general kernel framework or in case of linear SVMs so called "DotFeatures", i.e., features providing a minimalistic set of operations (like the dot product).

Features

SHOGUN includes the LinAdd accelerations for string kernels and the COFFIN framework for on-demand computing of features for the contained linear SVMs. In addition it contains more advanced Multiple Kernel Learning, Multi Task Learning and Structured Output learning algorithms and other linear methods. SHOGUN digests input feature-objects of basically any known type, e.g., dense, sparse or variable length features (strings) of any type char/byte/word/int/long int/float/double/long double.

The toolbox provides efficient implementations to 35 different kernels among them the

  • Linear,
  • Polynomial,
  • Gaussian and
  • Sigmoid Kernel

and also provides a number of recent string kernels like the

  • Locality Improved,
  • Fischer,
  • TOP,
  • Spectrum,
  • Weighted Degree Kernel (with shifts) .

For the latter the efficient LINADD optimizations are implemented. Also SHOGUN offers the freedom of working with custom pre-computed kernels. One of its key features is the combined kernel which can be constructed by a weighted linear combination of a number of sub-kernels, each of which not necessarily working on the same domain. An optimal sub-kernel weighting can be learned using Multiple Kernel Learning. Currently SVM one-class, 2-class, multi-class classification and regression problems are supported. However SHOGUN also implements a number of linear methods like

  • Linear Discriminant Analysis (LDA)
  • Linear Programming Machine (LPM),
  • Perceptrons and features algorithms to train Hidden Markov Models.

The input feature-objects can be read from plain ascii files (tab separated values for dense matrices; for sparse matrices libsvm/svmlight format), a efficient native binary format and general support to the hdf5 based format, supporting

  • dense
  • sparse or
  • strings of various types

that can often be converted between each other. Chains of preprocessors (e.g. subtracting the mean) can be attached to each feature object allowing for on-the-fly pre-processing.

Structure and Interfaces

SHOGUN's core is implemented in C++ and is provided as a library libshogun to be readily usable for C++ application developers. Its common interface functions are encapsulated in libshogunui, such that only minimal code (like setting or getting a double matrix to/from the target language) is necessary. This allowed us to easily create interfaces to Matlab(tm), R, Octave and Python. (note that a modular object oriented and static interfaces are provided to r, octave, matlab, python, python_modular, r_modular, octave_modular, cmdline, libshogun).

Application

We have successfully applied SHOGUN to several problems from computational biology, such as Super Family classification, Splice Site Prediction, Interpreting the SVM Classifier, Splice Form Prediction, Alternative Splicing and Promoter Prediction. Some of them come with no less than 10 million training examples, others with 7 billion test examples.

Documentation

We use Doxygen for both user and developer documentation which may be read online here. More than 600 documented examples for the interfaces python_modular, octave_modular, r_modular, static python, static matlab and octave, static r, static command line and C++ libshogun developer interface can be found in the documentation.

Changes to previous version:

This release features the work of our 8 GSoC 2014 students [student; mentors]:

  • OpenCV Integration and Computer Vision Applications [Abhijeet Kislay; Kevin Hughes]
  • Large-Scale Multi-Label Classification [Abinash Panda; Thoralf Klein]
  • Large-scale structured prediction with approximate inference [Jiaolong Xu; Shell Hu]
  • Essential Deep Learning Modules [Khaled Nasr; Sergey Lisitsyn, Theofanis Karaletsos]
  • Fundamental Machine Learning: decision trees, kernel density estimation [Parijat Mazumdar ; Fernando Iglesias]
  • Shogun Missionary & Shogun in Education [Saurabh Mahindre; Heiko Strathmann]
  • Testing and Measuring Variable Interactions With Kernels [Soumyajit De; Dino Sejdinovic, Heiko Strathmann]
  • Variational Learning for Gaussian Processes [Wu Lin; Heiko Strathmann, Emtiyaz Khan]

It also contains several cleanups and bugfixes:

Features

  • New Shogun project description [Heiko Strathmann]
  • ID3 algorithm for decision tree learning [Parijat Mazumdar]
  • New modes for PCA matrix factorizations: SVD & EVD, in-place or reallocating [Parijat Mazumdar]
  • Add Neural Networks with linear, logistic and softmax neurons [Khaled Nasr]
  • Add kernel multiclass strategy examples in multiclass notebook [Saurabh Mahindre]
  • Add decision trees notebook containing examples for ID3 algorithm [Parijat Mazumdar]
  • Add sudoku recognizer ipython notebook [Alejandro Hernandez]
  • Add in-place subsets on features, labels, and custom kernels [Heiko Strathmann]
  • Add Principal Component Analysis notebook [Abhijeet Kislay]
  • Add Multiple Kernel Learning notebook [Saurabh Mahindre]
  • Add Multi-Label classes to enable Multi-Label classification [Thoralf Klein]
  • Add rectified linear neurons, dropout and max-norm regularization to neural networks [Khaled Nasr]
  • Add C4.5 algorithm for multiclass classification using decision trees [Parijat Mazumdar]
  • Add support for arbitrary acyclic graph-structured neural networks [Khaled Nasr]
  • Add CART algorithm for classification and regression using decision trees [Parijat Mazumdar]
  • Add CHAID algorithm for multiclass classification and regression using decision trees [Parijat Mazumdar]
  • Add Convolutional Neural Networks [Khaled Nasr]
  • Add Random Forests algorithm for ensemble learning using CART [Parijat Mazumdar]
  • Add Restricted Botlzmann Machines [Khaled Nasr]
  • Add Stochastic Gradient Boosting algorithm for ensemble learning [Parijat Mazumdar]
  • Add Deep contractive and denoising autoencoders [Khaled Nasr]
  • Add Deep belief networks [Khaled Nasr]

Bugfixes

  • Fix reference counting bugs in CList when reference counting is on [Heiko Strathmann, Thoralf Klein, lambday]
  • Fix memory problem in PCA::apply_to_feature_matrix [Parijat Mazumdar]
  • Fix crash in LeastAngleRegression for the case D greater than N [Parijat Mazumdar]
  • Fix memory violations in bundle method solvers [Thoralf Klein]
  • Fix fail in library_mldatahdf5.cpp example when http://mldata.org is not working properly [Parijat Mazumdar]
  • Fix memory leaks in Vowpal Wabbit, LibSVMFile and KernelPCA [Thoralf Klein]
  • Fix memory and control flow issues discovered by Coverity [Thoralf Klein]
  • Fix R modular interface SWIG typemap (Requires SWIG >= 2.0.5) [Matt Huska]

Cleanup and API Changes

  • PCA now depends on Eigen3 instead of LAPACK [Parijat Mazumdar]
  • Removing redundant and fixing implicit imports [Thoralf Klein]
  • Hide many methods from SWIG, reducing compile memory by 500MiB [Heiko Strathmann, Fernando Iglesias, Thoralf Klein]
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Cygwin, Linux, Macosx, Bsd
Data Formats: Plain Ascii, Svmlight, Binary, Fasta, Fastq, Hdf
Tags: Bioinformatics, Large Scale, String Kernel, Kernel, Kernelmachine, Lda, Lpm, Matlab, Mkl, Octave, Python, R, Svm, Sgd, Icml2010, Liblinear, Libsvm, Multiple Kernel Learning, Ocas, Gaussian Processes, Reg
Archive: download here

Other available revisons

Version Changelog Date
4.0.0

This release features the work of our 8 GSoC 2014 students [student; mentors]:

  • OpenCV Integration and Computer Vision Applications [Abhijeet Kislay; Kevin Hughes]
  • Large-Scale Multi-Label Classification [Abinash Panda; Thoralf Klein]
  • Large-scale structured prediction with approximate inference [Jiaolong Xu; Shell Hu]
  • Essential Deep Learning Modules [Khaled Nasr; Sergey Lisitsyn, Theofanis Karaletsos]
  • Fundamental Machine Learning: decision trees, kernel density estimation [Parijat Mazumdar ; Fernando Iglesias]
  • Shogun Missionary & Shogun in Education [Saurabh Mahindre; Heiko Strathmann]
  • Testing and Measuring Variable Interactions With Kernels [Soumyajit De; Dino Sejdinovic, Heiko Strathmann]
  • Variational Learning for Gaussian Processes [Wu Lin; Heiko Strathmann, Emtiyaz Khan]

It also contains several cleanups and bugfixes:

Features

  • New Shogun project description [Heiko Strathmann]
  • ID3 algorithm for decision tree learning [Parijat Mazumdar]
  • New modes for PCA matrix factorizations: SVD & EVD, in-place or reallocating [Parijat Mazumdar]
  • Add Neural Networks with linear, logistic and softmax neurons [Khaled Nasr]
  • Add kernel multiclass strategy examples in multiclass notebook [Saurabh Mahindre]
  • Add decision trees notebook containing examples for ID3 algorithm [Parijat Mazumdar]
  • Add sudoku recognizer ipython notebook [Alejandro Hernandez]
  • Add in-place subsets on features, labels, and custom kernels [Heiko Strathmann]
  • Add Principal Component Analysis notebook [Abhijeet Kislay]
  • Add Multiple Kernel Learning notebook [Saurabh Mahindre]
  • Add Multi-Label classes to enable Multi-Label classification [Thoralf Klein]
  • Add rectified linear neurons, dropout and max-norm regularization to neural networks [Khaled Nasr]
  • Add C4.5 algorithm for multiclass classification using decision trees [Parijat Mazumdar]
  • Add support for arbitrary acyclic graph-structured neural networks [Khaled Nasr]
  • Add CART algorithm for classification and regression using decision trees [Parijat Mazumdar]
  • Add CHAID algorithm for multiclass classification and regression using decision trees [Parijat Mazumdar]
  • Add Convolutional Neural Networks [Khaled Nasr]
  • Add Random Forests algorithm for ensemble learning using CART [Parijat Mazumdar]
  • Add Restricted Botlzmann Machines [Khaled Nasr]
  • Add Stochastic Gradient Boosting algorithm for ensemble learning [Parijat Mazumdar]
  • Add Deep contractive and denoising autoencoders [Khaled Nasr]
  • Add Deep belief networks [Khaled Nasr]

Bugfixes

  • Fix reference counting bugs in CList when reference counting is on [Heiko Strathmann, Thoralf Klein, lambday]
  • Fix memory problem in PCA::apply_to_feature_matrix [Parijat Mazumdar]
  • Fix crash in LeastAngleRegression for the case D greater than N [Parijat Mazumdar]
  • Fix memory violations in bundle method solvers [Thoralf Klein]
  • Fix fail in library_mldatahdf5.cpp example when http://mldata.org is not working properly [Parijat Mazumdar]
  • Fix memory leaks in Vowpal Wabbit, LibSVMFile and KernelPCA [Thoralf Klein]
  • Fix memory and control flow issues discovered by Coverity [Thoralf Klein]
  • Fix R modular interface SWIG typemap (Requires SWIG >= 2.0.5) [Matt Huska]

Cleanup and API Changes

  • PCA now depends on Eigen3 instead of LAPACK [Parijat Mazumdar]
  • Removing redundant and fixing implicit imports [Thoralf Klein]
  • Hide many methods from SWIG, reducing compile memory by 500MiB [Heiko Strathmann, Fernando Iglesias, Thoralf Klein]
February 5, 2015, 09:09:37
3.2.0

This is mostly a bugfix release:

Features

  • Fully support python3 now
  • Add mini-batch k-means [Parijat Mazumdar]
  • Add k-means++ [Parijat Mazumdar]
  • Add sub-sequence string kernel [lambday]

Bugfixes

  • Compile fixes for upcoming swig3.0
  • Speedup for gaussian process' apply()
  • Improve unit / integration test checks
  • libbmrm uninitialized memory reads
  • libocas uninitialized memory reads
  • Octave 3.8 compile fixes [Orion Poplawski]
  • Fix java modular compile error [Bjoern Esser]
February 17, 2014, 20:31:36
3.1.1

This is a bugfix release:

Bugfixes

  • Fix compile error occurring with CXX0X
  • Bump data version to required version
January 9, 2014, 08:34:07
3.1.0

This release also contains several cleanups and bugfixes:

Features

- Add option to set k-means cluster centers [Parijat Mazumdar]
- Add leave one out crossvalidation scheme [Saurabh Mahindre]
- Add multiclass ipython notebook tutorials [Chiyuan Zhang]
- Add learning of StreamingSparseFeatures in OnlineLibLinear [Thoralf Klein]

Bugfixes

- Decrease memory footprint of SGObject
- Fix protobuf detection
- Fix doxygen files and various doxygen errors
- Fix compile error with directors
- Fix memory leak in modular interfaces and apply*()
- Fix leak in KNN::store_model_features
- Notebook fixes
- Allow custom kernel matrices of size 2^31-1 x 2^31-1 [Koen van de Sande]
- Fix Protobuf cmake detection
- Fix LabelsFactory methods' object ownership in SWIG interfaces with the %newobject directive.

Cleanup and API Changes

- Introduce slim SGRefObject for refcounted objects as base class of
SGObject [Thoralf Klein]
January 5, 2014, 13:26:57
3.0.0

This release features 8 successful Google Summer of Code projects and it is the result of an incredible effort by our students. All projects come with very cool ipython-notebooks that contain background, code examples and visualizations. These can be found on our webpage!

Features

  • In addition, the following features have been added:
  • Added method to importance sample the (true) marginal likelihood of a Gaussian Process using a posterior approximation.
  • Added a new class for classical probability distribution that can be sampled and whose log-pdf can be evaluated. Added the multivariate Gaussian with various numerical flavours.
  • Cross-validation framework works now with Gaussian Processes
  • Added nu-SVR for LibSVR class
  • Modelselection is now supported for parameters of sub-kernels of combined kernels in the MKL context. Thanks to Evangelos Anagnostopoulos
  • Probability output for multi-class SVMs is now supported using various heuristics. Thanks to Shell Xu Hu.
  • Added an "equals" method to all Shogun objects that recursively compares all registered parameters with those of another instance -- up to a specified accuracy.
  • Added a "clone" method to all Shogun objects that creates a deep copy
  • Multiclass LDA. Thanks to Kevin Hughes.
  • Added a new datatype, complex128_t, for complex numbers. Math functions, support for SGVector/Matrix, SGSparseVector/Matrix, and serialization with Ascii and Xml files added. [Soumyajit De].
  • Added mini-framework for numerical integration in one variable. Implemented Gauss-Kronrod and Gauss-Hermite quadrature formulas.
  • Changed from configure script to CMake by Viktor Gal.
  • Add C++0x and C++11 cmake detection scripts
  • ND-Array typmap support for python and octave modular.

Bugfixes

  • Fix json serialization.

  • Fixed bugs in FITC inference method that caused wrong posterior results.

  • Fixed bugs in GP Regression that caused negative values for the variances.

  • Fixed two memory errors in the streaming-features framework.

  • Fixed bug in the Kernel Mean Matching implementation (thanks to Meghana Kshirsagar).

  • Bugfixes Cleanups and API Changes:

  • Switch compile system to cmake

  • SGSparseVector/Matrix are now derived from SGReferenceData and thus refcounted.

  • Move README and INSTALL files to top level directory.

  • Use common RefCount class for ReferencedData and CSGObjects.

  • Rename HMSVMLabels to SequenceLabels

  • Refactored method to fit a sigmoid to SVM scores, now in CStatistics, still called from CBinaryLabels.

  • Use Dynamic arrays to hold preprocessors in features instead of raw pointers.

  • Use Dynamic arrays to hold Features in CombinedFeatures.

  • Use Dynamic arrays to hold Kernels in CombinedKernels/ProductKernels.

  • Use Eigen3 for GPs, LDA

October 29, 2013, 18:58:51
2.1.0

This release also contains several enhancements, cleanups and bugfixes:

Features

  • Linear Time MMD two-sample test now works on streaming-features, which allows to perform tests on infinite amounts of data. A block size may be specified for fast processing. The below features were also added. By Heiko Strathmann.
  • It is now possible to ask streaming features to produce an instance of streamed features that are stored in memory and returned as a CFeatures* object of corresponding type. See CStreamingFeatures::get_streamed_features().
  • New concept of artificial data generator classes: Based on streaming features. First implemented instances are CMeanShiftDataGenerator and CGaussianBlobsDataGenerator. Use above new concepts to get non-streaming data if desired.
  • Accelerated projected gradient multiclass logistic regression classifier by Sergey Lisitsyn.
  • New CCSOSVM based structured output solver by Viktor Gal
  • A collection of kernel selection methods for MMD-based kernel two- sample tests, including optimal kernel choice for single and combined kernels for the linear time MMD. This finishes the kernel MMD framework and also comes with new, more illustrative examples and tests. By Heiko Strathmann.
  • Alpha version of Perl modular interface developed by Christian Montanari.
  • New framework for unit-tests based on googletest and googlemock by Viktor Gal. A (growing) number of unit-tests from now on ensures basic funcionality of our framework. Since the examples do not have to take this role anymore, they should become more ilustrative in the future.
  • Changed the core of dimension reduction algorithms to the Tapkee library.

Bugfixes

  • Fix for shallow copy of gaussian kernel by Matt Aasted.
  • Fixed a bug when using StringFeatures along with kernel machines in cross-validation which cause an assertion error. Thanks to Eric (yoo)!
  • Fix for 3-class case training of MulticlassLibSVM reported by Arya Iranmehr that was suggested by Oksana Bayda.
  • Fix for wrong Spectrum mismatch RBF construction in static interfaces reported by Nona Kermani.
  • Fix for wrong include in SGMatrix causing build fail on Mac OS X (thanks to @bianjiang).
  • Fixed a bug that caused kernel machines to return non-sense when using custom kernel matrices with subsets attached to them.
  • Fix for parameter dictionary creationg causing dereferencing null pointers with gaussian processes parameter selection.
  • Fixed a bug in exact GP regression that caused wrong results.
  • Fixed a bug in exact GP regression that produced memory errors/crashes.
  • Fix for a bug with static interfaces causing all outputs to be -1/+1 instead of real scores (reported by Kamikawa Masahisa).

Cleanup and API Changes

  • SGStringList is now based on SGReferencedData.
  • "confidences" in context of CLabel and subclasses are now "values".
  • CLinearTimeMMD constructor changes, only streaming features allowed.
  • CDataGenerator will soon be removed and replaced by new streaming- based classes.
  • SGVector, SGMatrix, SGSparseVector, SGSparseVector, SGSparseMatrix refactoring: Now contains load/save routines, relevant functions from CMath, and implementations went to .cpp file.
March 17, 2013, 13:59:34
2.0.0

This release contains several enhancements, cleanups and bugfixes:

Features

  • This release contains first release of Efficient Dimensionality Reduction Toolkit (EDRT).
  • Support for new SWIG -builtin python interface feature (SWIG 2.0.4 is required now).
  • EDRT algorithms are now available using static interfaces such as matlab and octave.
  • Jensen-Shannon kernel and Homogeneous kernel map preprocessor (thanks to Viktor Gal).
  • New 'multiclass' module for multiclass classification algorithms, generic linear and kernel multiclass machines, multiclass LibLinear and OCAS wrappers, new rejection schemes concept by Sergey Lisitsyn.
  • Various multitask learning algorithms including L1/Lq multitask group lasso logistic regression and least squares regression, L1/L2 multitask tree guided group lasso logistic regression and least squares regression, trace norm regularized multitask logistic regression, clustered multitask logistic regression and L1/L2 multitask group logistic regression by Sergey Lisitsyn.
  • Group and tree-guided logistic regression for binary and multiclass problems by Sergey Lisitsyn.
  • Mahalanobis distance, QDA, Stochastic Proximity Embedding, generic OvO multiclass machine and CoverTree & KNN integation (thanks to Fernando J. Iglesias Garcia).
  • Structured output learning framework by Fernando J. Iglesias Garcia.
  • Hidden markov support vector machine structured output model by Fernando J. Iglesias Garcia.
  • Implementations of three Bundle method for risk minimization (BMRM) variants by Michal Uricar.
  • Latent SVM framework and latent detector example by Viktor Gal.
  • Gaussian processes framework for parameters selection and gaussian processes regression estimation framework by Jacob Walker.
  • New graphical python modular examples.
  • Standard Cross-Validation splitting for regression problems by Heiko Strathmann
  • New data-locking concept by Heiko Strathmann which allows to tell machines that data is not going to change during training/testing until unlocked. KernelMachines now make use of that by not recomputing kernel matrix in cross-validation.
  • Cross-validation for KernelMachines is now parallelized.
  • Cross-validation is now possible with custom kernels.
  • Features may now have arbritarily many index subsets (of subsets (of subsets (...))).
  • Various clustering measures, Least Angle Regression and new multiclass strategies concept (thanks to Chiyuan Zhang).
  • A bunch of multiclass learning algorithms including the ShareBoost algorithm, ECOC framework, conditional probability tree, balanced conditional probability tree, random conditional probability tree and relaxed tree by Chiyuan Zhang.
  • Python Sparse matrix typemap for octave modular interface (thanks to Evgeniy Andreev).
  • Newton SVM port (thanks to Harshit Syal).
  • Some progress on native windows compilation using cmake and mingw-w64 (thanks to Josh aka jklontz).
  • CMake compilation improvements (thanks to Eric aka yoo).

Bugfixes

  • Fix for bug in the Gaussian Naive Bayes classifier, its domain was changed to log-space.
  • Fix for R_static interface installation (thanks Steve Lianoglou).
  • SVMOcas memsetting and max_train_time bugfix.
  • Various fixes for compile errors with clang.
  • Stratified-cross-validation now used different indices for each run.

Cleanup and API Changes

  • Various code cleanups by Evan Shelhamer
  • Parameter migration framework by Heiko Strathmann. From now on, changes in the shogun objects will not break loading old serialized files anymore
September 5, 2012, 21:57:35
1.1.0

This release contains major enhancements, cleanups and bugfixes:

Features

  • New dimensionality reduction algorithms: Diffusion Maps, Kernel Locally Linear Embedding, Kernel Local Tangent Space Alignment, Linear Local Tangent Space Alignment, Neighborhood Preserving embedding, Locality Preserving Projections.
  • Various performance improvements for dimensionality reduction methods (BLAS, alignment formulation of the LLE, ..)
  • Automatical k determination mode for Locally Linear Embedding dimension reduction method based on reconstruction error.
  • ARPACK and SUPERLU integration.
  • Introduce the concept of Converters that can embed (arbitrary) feature types into different feature types.
  • LibSVM is now pthread-parallelized.
  • Create modshogun.dll for csharp.
  • Various new c# examples (thanks Daniel Korn).
  • Dimensionality reduction examples application is introduced

Bugfixes

  • Octave_static and octave_modular examples fix.
  • Memory leak in custom kernel is now eliminated (thanks Madeleine Seeland for reporting).
  • Fix for linear machine set_w method (thanks Brian Cheung for reporting).
  • DotFeatures fix for assert bug.
  • FibonacciHeap memory leak fix.
  • Fix for Java modular interface typemapping bug.
  • Fix errors uncovered by LLVM / clang++.
  • Fix for configure on Darwin-x86_64 (thanks Peter Romov for patch).
  • Improve lua / ruby detection.
  • Fix configure / compilation under osx and cygwin for variuos interfaces.

Cleanup and API Changes

  • Most of the inline functions have been (re)moved to the corresponding .cpp file
  • Libshogun is now being compiled with sse support for math (if available) but interfaces are now being compiled with -O0 key which drastically reduces compilation time
December 13, 2011, 05:11:29
1.0.0

This release contains major enhancements, cleanups and bugfixes:

Features

  • Support for new languages: java, c#, ruby, lua in modular interfaces (GSoC project of Baozeng Ding)
  • Port all examples to the new languages: Ruby examples with example transition tool (thanks to Justin Patera aka serialhex)
  • Dimensionality reduction (manifold learning) algorithms are now available. In particular: Locally Linear Embedding (LLE), Hessian Locally Linear Embedding (HLLE), Local Tangent Space Alignment (LTSA), Kernel PCA (kPCA), Multidimensional Scaling (MDS, with possible landmark approximation), Isomap (using Fibonacci Heap Dijkstra for shortest paths), Laplacian Eigenmaps (GSoC project of Sergey Lisitsyn)
  • Various new kernels: TStudentKernel, CircularKernel, WaveKernel, SplineKernel, LogKernel, RationalQuadraticKernel, WaveletKernel, BesselKernel, PowerKernel, ExponentialKernel, CauchyKernel, ANOVAKernel, MultiquadricKernel, SphericalKernel, DistantSegmentsKernel (thanks GSoC students for the contributions!)
  • Streaming / Online Feature Framework for SimpleFeatures, SparseFeatures, StringFeatures (GSoC project of Shashwat Lal Das)
  • SGD-QN, Online SGD, Online Liblinear, Online Vowpal Vabit (GSoC project of Shashwat Lal Das)
  • Model selection framework for arbitrary Machines (GSoC project of Heiko Strathmann)
  • Gaussian Mixture Models (GSoC project of Alesis Novik)
  • FibonacciHeap for efficient shortest-path problem solving (thanks to Evgeniy Andreev)
  • Efficient HashSet (thanks to Evgeniy Andreev)
  • ARPACK wrapper (dseupd) for symmetric eigenproblems (both generalized and non-generalized), some new LAPACK wrappers (Sergey Lisitsyn)
  • New Statistics module for various statistics measures (Heiko Strathmann)
  • Subset support to features (Heiko Strathmann)
  • Java externalization support (Sergey Lisitsyn)
  • Support matlab 2011a.

Bugfixes

  • Fix build failure with ld --as-needed (thanks Matthias Klose for the patch).
  • Fix initialization error in KRR static interfaces (thanks Maxwell Collins for the patch).

Cleanup and API Changes

  • Introduce Machine, KernelMachine, LinearMachine, LinearOnlineMachine, DistanceMachine with train() and apply() functions and drop Classifier.
  • Restructure source code layout: Merge libshogunui and libshogun into src/shogun and move all interfaces into src/shogun. Split up lib into lib, io and mathematics.
  • Create a single 'modshogun' module resembling the functionality found in libshogun. Now octave_modular and other modular interfaces work reliably.
  • Introduce SGVector, SGMatrix, SGNDArray, SGStringList for transfering object-pointers and meta-data from/to shogun.
  • Classes no longer store copies of e.g. matrices, and just pass pointers on set/get operations.
  • Stop using new[] / delete[] and switch to SG_MALLOC, SG_CALLOC, SG_REALLOC, SG_FREE macros.
  • Preproc renamed to preprocessor, PCACut renamed to PCA
September 1, 2011, 02:09:45
0.10.0

This release contains several enhancements, cleanups and bugfixes:

Features

  • Serialization of objects deriving from CSGObject, i.e. all shogun objects (SVM, Kernel, Features, Preprocessors, ...) as ASCII, JSON, XML and HDF5
  • Create SVMLightOneClass
  • Add CustomDistance in analogy to custom kernel
  • Add HistogramIntersectionKernel (thanks Koen van de Sande for the patch)
  • Matlab 2010a support
  • SpectrumMismatchRBFKernel modular support (thanks Rob Patro for the patch)
  • Add ZeroMeanCenterKernelNormalizer (thanks Gorden Jemwa for the patch)
  • Swig 2.0 support

Bugfixes

  • Custom Kernels can now be > 4G (thanks Koen van de Sande for the patch)
  • Set C locale on startup in init_shogun to prevent incompatiblies with ascii floats and fprintf
  • Compile fix when reference counting is disabled
  • Fix set_position_weights for wd kernel (reported by Dave duVerle)
  • Fix set_wd_weights for wd kernel.
  • Fix crasher in SVMOcas (reported by Yaroslav)

Cleanup and API Changes

  • Renamed SVM_light/SVR_light to SVMLight etc.
  • Remove C prefix in front of non-serializable class names
  • Drop CSimpleKernel and introduce CDotKernel as its base class. This way all dot-product based kernels can be applied on top of DotFeatures and only a single implementation for such kernels is needed.
December 7, 2010, 15:35:26
0.9.3

This release contains several enhancements, cleanups and bugfixes:

Features

  • Experimental lp-norm MCMKL
  • New Kernels: SpectrumRBFKernelRBF, SpectrumMismatchRBFKernel, WeightedDegreeRBFKernel
  • WDK kernel supports amino acids
  • String Features now support append operations
  • python-dbg support
  • Allow floats as input for custom kernel (and matrices > 4GB in size)

Bugfixes

  • Static linking fix.
  • Fix sparse linear kernel's add_to_normal

Cleanup and API Changes

  • Remove init() function in Performance Measures
  • Adjust .so suffix for python and use python distutils to figure out install paths
May 31, 2010, 15:31:49
0.9.2

This release contains several enhancements, cleanups and bugfixes:

Features

  • Direct reading and writing of ASCII/Binary files/HDF5 based files.
  • Implemented multi task kernel normalizer.
  • Implement SNP kernel.
  • Implement time limit for libsvm/libsvr.
  • Integrate Elastic Net MKL (thanks Ryoata Tomioka for the patch).
  • Implement Hashed WD Features.
  • Implement Hashed Sparse Poly Features.
  • Integrate liblinear 1.51
  • LibSVM can now be trained with bias disabled.
  • Add functions to set/get global and local io/parallel/... objects.

Bugfixes

  • Fix set_w() for linear classifiers.
  • Static Octave, Python, Cmdline and Modular Python interfaces Compile cleanly under Windows/Cygwin again.
  • In static interfaces testing could fail when not directly done after training.
March 31, 2010, 00:50:12
0.9.1

This release contains several enhancements, cleanups and bugfixes:

Features

  • Integrate LaRank.
  • Memory Mapped Features (for data sets that don't fit into memory).
  • Compressor module with compression and decompression support for lzo, gzip, bzip2 and lzma.
  • Compressed String Features with on-the-fly decompression (CDecompressString preproc).
  • Parallel computation of get_kernel_matrix().
  • One may now prefix all shogun print/outputs with file name and line number (obj.io.enable_file_and_line())
  • Chinese Documentation thanks Elpmis Lee.

Bugfixes

  • Fix One class MKL testing in static interfaces.
  • Configure fixes: Let octave not write history on configure; fail when cplex is forcefully enabled but not found; add cplex 12 support.
  • Fix a problem with regression and CombinedKernels employing only Custom kernels.

Cleanup and API Changes

  • String Features now (like SimpleFeatures) upon get_feature_vector require an additional do_free argument and need to be freed using free_feature_vector.
November 16, 2009, 11:02:41
0.9.0

This release contains several cleanups and enhancements:

Features

  • Implement set_linear_classifier for static interfaces.
  • Implement Polynomial DotFeatures.
  • Implement domain adaptation SVM.
  • Speed up ScatterSVM.
  • Initial implementation for saving and Loading of shogun objects.
  • Examples have been polished/split up into separate files.
  • Documentation and webpage improvements.

Bugfixes

  • Fix one class MKL for static interfaces.
  • Fix performance measures integer overflow.
  • Configure fixes to run under OSX's snow leopard.
  • Compiles and runs under solaris both using suncc and gcc.

Cleanup and API Changes

  • It is no longer necessary to call init_kernel TRAIN/TEST.
  • Removed kernel {load,save}_init.
  • Removed preproc {load,save}_init.
  • Move the mkl code from classifier/svm to classifier/mkl.
  • Removed obsolete mindy support.
  • Rename MCSVM to ScatterSVM
  • Move distributions to distributions/ directory.
  • CClassifier::classify() no longer has a label as argument.
  • Introduce CClassifier::train(CFeatures * ) and classify(CFeatures *) for more effective training/testing.
  • Remove unnecessary global symbols.
October 23, 2009, 14:23:21
0.8.0

This release contains several cleanups, features and bugfixes:

Features

  • Implements new multiclass svm formulation.
  • 1,2 and general q-norm MKL for classification, regression and one-class for wrapper and chunking algorithm for arbitrary (dual) SVM solvers.
  • Dynamic Programming code is now accessible from python.
  • Implements Regulatory Modules kernel.
  • Documentation updates (Tutorial, improved installation instructions, overview about the implemented algorithms).

Bugfixes

  • Correct q-norm MKL for Newton.
  • Upon make install of elwms don't install files into R/octave/python if these interfaces were not configured
  • Svm-nu parameter was not set correctly.
  • Fix custom kernel initialization.
  • Correct get_subkernel_weights.
  • Proper Intel core2 compile flags detection
  • Fix number of outputs for KNN.
  • Run tests with proper LD_LIBRARY_PATH set.
  • Fix several memory leaks.

Cleanup and API Changes

  • Rename svm_one_class_nu to svm_nu.
  • Clean up dynamic programming code.
  • Remove commands from_position_list and slide_window and move functionallity into set/add_features,
  • Remove now obsolete legacy examples.
August 16, 2009, 19:53:50
0.7.3

This release contains several cleanups and bugfixes:

Features

  • Improve libshogun/developer tutorial.
  • Implement convenience function for parallel quicksort.
  • Fasta/fastq file loading for StringFeatures.

Bugfixes

  • get_name function was undefined in Evaluation causing the PerformanceMeasures class to be defunct.
  • Workaround bugs in the std template library for math functions.
  • Compiles cleanly under OSX now, thanks to James Kyle.

Cleanup and API Changes

  • Make sure that all destructors are declared virtual.
May 2, 2009, 22:45:13
0.7.2

This release contains several cleanups and enhancements:

Features:

  • Support all data types from python_modular: dense, scipy-sparse csc_sparse matrices and strings of type bool, char, (u)int{8,16,32,64}, float{32,64,96}. In addition, individual vectors/strings can now be obtained and even changed. See examples/python_modular/features_*.py for examples.
  • AUC maximization now works with arbitrary kernel SVMs.
  • Documentation updates, many examples have been polished.
  • Slightly speedup Oligo kernel.

Bugfixes:

  • Fix reading strings from directory (f.load_from_directory()).
  • Update copyright to 2009.

Cleanup and API Changes:

  • Remove {Char,Short,Word,Int,Real}Features and only ever use the templated SimpleFeatures.
  • Split up examples in examples/python_modular to separate files.
  • Now use s.set_features(strs) instead of s.set_string_features(strs) to set string features.
  • The meaning of the width parameter for the Oligo Kernel changed, the OligoKernel has been renamed to OligoStringKernel.
March 23, 2009, 10:23:04
0.7.1

This release contains several cleanups, feature enhancements and bugfixes:

Features:

  • configure now detects libshogun/ui installed in /usr/(local/)lib if libshogun/ui dirs are removed.
  • Improved documentation (and path and doxygen fixes).
  • Tutorial on how to develop with libshogun and to extend shogun.
  • Added the elwms (eilergendewollmilchsau) interface that is a chimera that in one file interfaces to python,octave,r,matlab and provides the run_{octave,python,r} command to run code in {octave,python,r} from within octave,r,matlab,python transparently making variables available to the target interface avoiding file i/o.
  • Implement AttributeFeatures for (attr,value) pairs, trees etc.

Bugfixes:

  • fix a crasher occurring with combined kernel and multiple threads.
  • configure now allows building of modular interfaces only.
  • n-dimensional arrays work now in octave.

Cleanup and API Changes:

  • Custom Kernel no longer requires features nor initialization, even not when used in CombinedKernel (the combined kernel will skip over custom kernels on init).
March 8, 2009, 20:30:32
0.7.0

This release contains major feature enhancements and bugfixes:

  • Implement DotFeatures and CombinedDotFeatures. DotFeatures need to provide dot-product and similar operations (hence the name). This enables training of linear methods with mixed datatypes (sparse and dense and other even the newly implemented string based SpecFeatures and WDFeatures).
  • MKL now does not require CPLEX any longer.
  • Add q-norm MKL support based on internal Newton implementation.
  • Add 1-norm MKL support based on GLPK.
  • Add multiclass MKL support based on the GLPK and the GMNP svm solver.
  • Implement Tensor Product Pair Kernel (TPPK).
  • Support compilation on the iPhone :)
  • Add an option to set wds kernel position weights.
  • Build static libshogun.a for libshogun target.
  • Testsuite can also test the modular R interface, added test for OligoKernel.
  • Ocas and WDOcas can be used with a bias feature now.
  • Update to LibSVM 2.88.
  • Enable parallelized HMM code by default.
February 20, 2009, 10:41:46
0.6.7

Initial Announcement on mloss.org.

October 11, 2007, 21:45:32

Comments

Soeren Sonnenburg (on September 12, 2008, 16:14:36)
In case you find bugs, feel free to report them at [http://trac.tuebingen.mpg.de/shogun](http://trac.tuebingen.mpg.de/shogun).
Tom Fawcett (on January 3, 2011, 03:20:48)
You say, "Some of them come with no less than 10 million training examples, others with 7 billion test examples." I'm not sure what this means. I have problems with mixed symbolic/numeric attributes and the training example sets don't fit in memory. Does SHOGUN require that training examples fit in memory?
Soeren Sonnenburg (on January 14, 2011, 18:12:01)
Shogun does not necessarily require examples to be in memory (if you use any of the FileFeatures). However, most algorithms within shogun are batch type - so using the non in-memory FileFeatures would probably be very slow. This does not matter for doing predictions of course, even though the 7 billion test examples above referred to predicting gene starts on the whole human genome (in memory ~3.5GB and a context window of 1200nt was shifted around in that string). In addition one can compute features (or feature space) on-the-fly potentially saving lots of memory. Not sure how big your problem is but I guess this is better discussed on the shogun mailinglist.
Yuri Hoffmann (on September 14, 2013, 17:12:16)
cannot use the java interface in cygwin (already reported on github) nor in debian.

Leave a comment

You must be logged in to post comments.