
 Description:
Overview
The SHOGUN machine learning toolbox's focus is on large scale kernel methods and especially on Support Vector Machines (SVM). It comes with a generic interface for kernel machines and features 15 different SVM implementations that all access features in a unified way via a general kernel framework or in case of linear SVMs so called "DotFeatures", i.e., features providing a minimalistic set of operations (like the dot product).
Features
SHOGUN includes the LinAdd accelerations for string kernels and the COFFIN framework for ondemand computing of features for the contained linear SVMs. In addition it contains more advanced Multiple Kernel Learning, Multi Task Learning and Structured Output learning algorithms and other linear methods. SHOGUN digests input featureobjects of basically any known type, e.g., dense, sparse or variable length features (strings) of any type char/byte/word/int/long int/float/double/long double.
The toolbox provides efficient implementations to 35 different kernels among them the
 Linear,
 Polynomial,
 Gaussian and
 Sigmoid Kernel
and also provides a number of recent string kernels like the
 Locality Improved,
 Fischer,
 TOP,
 Spectrum,
 Weighted Degree Kernel (with shifts) .
For the latter the efficient LINADD optimizations are implemented. Also SHOGUN offers the freedom of working with custom precomputed kernels. One of its key features is the combined kernel which can be constructed by a weighted linear combination of a number of subkernels, each of which not necessarily working on the same domain. An optimal subkernel weighting can be learned using Multiple Kernel Learning. Currently SVM oneclass, 2class, multiclass classification and regression problems are supported. However SHOGUN also implements a number of linear methods like
 Linear Discriminant Analysis (LDA)
 Linear Programming Machine (LPM),
 Perceptrons and features algorithms to train Hidden Markov Models.
The input featureobjects can be read from plain ascii files (tab separated values for dense matrices; for sparse matrices libsvm/svmlight format), a efficient native binary format and general support to the hdf5 based format, supporting
 dense
 sparse or
 strings of various types
that can often be converted between each other. Chains of preprocessors (e.g. subtracting the mean) can be attached to each feature object allowing for onthefly preprocessing.
Structure and Interfaces
SHOGUN's core is implemented in C++ and is provided as a library libshogun to be readily usable for C++ application developers. Its common interface functions are encapsulated in libshogunui, such that only minimal code (like setting or getting a double matrix to/from the target language) is necessary. This allowed us to easily create interfaces to Matlab(tm), R, Octave and Python. (note that a modular object oriented and static interfaces are provided to r, octave, matlab, python, python_modular, r_modular, octave_modular, cmdline, libshogun).
Application
We have successfully applied SHOGUN to several problems from computational biology, such as Super Family classification, Splice Site Prediction, Interpreting the SVM Classifier, Splice Form Prediction, Alternative Splicing and Promoter Prediction. Some of them come with no less than 10 million training examples, others with 7 billion test examples.
Documentation
We use Doxygen for both user and developer documentation which may be read online here. More than 600 documented examples for the interfaces python_modular, octave_modular, r_modular, static python, static matlab and octave, static r, static command line and C++ libshogun developer interface can be found in the documentation.
 Changes to previous version:
This release also contains several enhancements, cleanups and bugfixes:
Features
 Linear Time MMD twosample test now works on streamingfeatures, which allows to perform tests on infinite amounts of data. A block size may be specified for fast processing. The below features were also added. By Heiko Strathmann.
 It is now possible to ask streaming features to produce an instance of streamed features that are stored in memory and returned as a CFeatures* object of corresponding type. See CStreamingFeatures::get_streamed_features().
 New concept of artificial data generator classes: Based on streaming features. First implemented instances are CMeanShiftDataGenerator and CGaussianBlobsDataGenerator. Use above new concepts to get nonstreaming data if desired.
 Accelerated projected gradient multiclass logistic regression classifier by Sergey Lisitsyn.
 New CCSOSVM based structured output solver by Viktor Gal
 A collection of kernel selection methods for MMDbased kernel two sample tests, including optimal kernel choice for single and combined kernels for the linear time MMD. This finishes the kernel MMD framework and also comes with new, more illustrative examples and tests. By Heiko Strathmann.
 Alpha version of Perl modular interface developed by Christian Montanari.
 New framework for unittests based on googletest and googlemock by Viktor Gal. A (growing) number of unittests from now on ensures basic funcionality of our framework. Since the examples do not have to take this role anymore, they should become more ilustrative in the future.
 Changed the core of dimension reduction algorithms to the Tapkee library.
Bugfixes
 Fix for shallow copy of gaussian kernel by Matt Aasted.
 Fixed a bug when using StringFeatures along with kernel machines in crossvalidation which cause an assertion error. Thanks to Eric (yoo)!
 Fix for 3class case training of MulticlassLibSVM reported by Arya Iranmehr that was suggested by Oksana Bayda.
 Fix for wrong Spectrum mismatch RBF construction in static interfaces reported by Nona Kermani.
 Fix for wrong include in SGMatrix causing build fail on Mac OS X (thanks to @bianjiang).
 Fixed a bug that caused kernel machines to return nonsense when using custom kernel matrices with subsets attached to them.
 Fix for parameter dictionary creationg causing dereferencing null pointers with gaussian processes parameter selection.
 Fixed a bug in exact GP regression that caused wrong results.
 Fixed a bug in exact GP regression that produced memory errors/crashes.
 Fix for a bug with static interfaces causing all outputs to be 1/+1 instead of real scores (reported by Kamikawa Masahisa).
Cleanup and API Changes
 SGStringList is now based on SGReferencedData.
 "confidences" in context of CLabel and subclasses are now "values".
 CLinearTimeMMD constructor changes, only streaming features allowed.
 CDataGenerator will soon be removed and replaced by new streaming based classes.
 SGVector, SGMatrix, SGSparseVector, SGSparseVector, SGSparseMatrix refactoring: Now contains load/save routines, relevant functions from CMath, and implementations went to .cpp file.
 BibTeX Entry: Download
 Corresponding Paper BibTeX Entry: Download
 URL: Project Homepage
 JMLR MLOSS PaperURL: JMLRMLOSS Paper Homepage
 Supported Operating Systems: Cygwin, Linux, Macosx, Bsd
 Data Formats: Plain Ascii, Svmlight, Binary, Fasta, Fastq, Hdf
 Tags: Bioinformatics, Large Scale, String Kernel, Kernel, Kernelmachine, Lda, Lpm, Matlab, Mkl, Octave, Python, R, Svm, Sgd, Icml2010, Liblinear, Libsvm, Multiple Kernel Learning, Ocas, Gaussian Processes, Reg
 Archive: download here
Other available revisons

Version Changelog Date 4.0.0 This release features the work of our 8 GSoC 2014 students [student; mentors]:
 OpenCV Integration and Computer Vision Applications [Abhijeet Kislay; Kevin Hughes]
 LargeScale MultiLabel Classification [Abinash Panda; Thoralf Klein]
 Largescale structured prediction with approximate inference [Jiaolong Xu; Shell Hu]
 Essential Deep Learning Modules [Khaled Nasr; Sergey Lisitsyn, Theofanis Karaletsos]
 Fundamental Machine Learning: decision trees, kernel density estimation [Parijat Mazumdar ; Fernando Iglesias]
 Shogun Missionary & Shogun in Education [Saurabh Mahindre; Heiko Strathmann]
 Testing and Measuring Variable Interactions With Kernels [Soumyajit De; Dino Sejdinovic, Heiko Strathmann]
 Variational Learning for Gaussian Processes [Wu Lin; Heiko Strathmann, Emtiyaz Khan]
It also contains several cleanups and bugfixes:
Features
 New Shogun project description [Heiko Strathmann]
 ID3 algorithm for decision tree learning [Parijat Mazumdar]
 New modes for PCA matrix factorizations: SVD & EVD, inplace or reallocating [Parijat Mazumdar]
 Add Neural Networks with linear, logistic and softmax neurons [Khaled Nasr]
 Add kernel multiclass strategy examples in multiclass notebook [Saurabh Mahindre]
 Add decision trees notebook containing examples for ID3 algorithm [Parijat Mazumdar]
 Add sudoku recognizer ipython notebook [Alejandro Hernandez]
 Add inplace subsets on features, labels, and custom kernels [Heiko Strathmann]
 Add Principal Component Analysis notebook [Abhijeet Kislay]
 Add Multiple Kernel Learning notebook [Saurabh Mahindre]
 Add MultiLabel classes to enable MultiLabel classification [Thoralf Klein]
 Add rectified linear neurons, dropout and maxnorm regularization to neural networks [Khaled Nasr]
 Add C4.5 algorithm for multiclass classification using decision trees [Parijat Mazumdar]
 Add support for arbitrary acyclic graphstructured neural networks [Khaled Nasr]
 Add CART algorithm for classification and regression using decision trees [Parijat Mazumdar]
 Add CHAID algorithm for multiclass classification and regression using decision trees [Parijat Mazumdar]
 Add Convolutional Neural Networks [Khaled Nasr]
 Add Random Forests algorithm for ensemble learning using CART [Parijat Mazumdar]
 Add Restricted Botlzmann Machines [Khaled Nasr]
 Add Stochastic Gradient Boosting algorithm for ensemble learning [Parijat Mazumdar]
 Add Deep contractive and denoising autoencoders [Khaled Nasr]
 Add Deep belief networks [Khaled Nasr]
Bugfixes
 Fix reference counting bugs in CList when reference counting is on [Heiko Strathmann, Thoralf Klein, lambday]
 Fix memory problem in PCA::apply_to_feature_matrix [Parijat Mazumdar]
 Fix crash in LeastAngleRegression for the case D greater than N [Parijat Mazumdar]
 Fix memory violations in bundle method solvers [Thoralf Klein]
 Fix fail in library_mldatahdf5.cpp example when http://mldata.org is not working properly [Parijat Mazumdar]
 Fix memory leaks in Vowpal Wabbit, LibSVMFile and KernelPCA [Thoralf Klein]
 Fix memory and control flow issues discovered by Coverity [Thoralf Klein]
 Fix R modular interface SWIG typemap (Requires SWIG >= 2.0.5) [Matt Huska]
Cleanup and API Changes
 PCA now depends on Eigen3 instead of LAPACK [Parijat Mazumdar]
 Removing redundant and fixing implicit imports [Thoralf Klein]
 Hide many methods from SWIG, reducing compile memory by 500MiB [Heiko Strathmann, Fernando Iglesias, Thoralf Klein]
February 5, 2015, 09:09:37 3.2.0 This is mostly a bugfix release:
Features
 Fully support python3 now
 Add minibatch kmeans [Parijat Mazumdar]
 Add kmeans++ [Parijat Mazumdar]
 Add subsequence string kernel [lambday]
Bugfixes
 Compile fixes for upcoming swig3.0
 Speedup for gaussian process' apply()
 Improve unit / integration test checks
 libbmrm uninitialized memory reads
 libocas uninitialized memory reads
 Octave 3.8 compile fixes [Orion Poplawski]
 Fix java modular compile error [Bjoern Esser]
February 17, 2014, 20:31:36 3.1.1 This is a bugfix release:
Bugfixes
 Fix compile error occurring with CXX0X
 Bump data version to required version
January 9, 2014, 08:34:07 3.1.0 This release also contains several cleanups and bugfixes:
Features
 Add option to set kmeans cluster centers [Parijat Mazumdar]  Add leave one out crossvalidation scheme [Saurabh Mahindre]  Add multiclass ipython notebook tutorials [Chiyuan Zhang]  Add learning of StreamingSparseFeatures in OnlineLibLinear [Thoralf Klein]
Bugfixes
 Decrease memory footprint of SGObject  Fix protobuf detection  Fix doxygen files and various doxygen errors  Fix compile error with directors  Fix memory leak in modular interfaces and apply*()  Fix leak in KNN::store_model_features  Notebook fixes  Allow custom kernel matrices of size 2^311 x 2^311 [Koen van de Sande]  Fix Protobuf cmake detection  Fix LabelsFactory methods' object ownership in SWIG interfaces with the %newobject directive.
Cleanup and API Changes
 Introduce slim SGRefObject for refcounted objects as base class of SGObject [Thoralf Klein]
January 5, 2014, 13:26:57 3.0.0 This release features 8 successful Google Summer of Code projects and it is the result of an incredible effort by our students. All projects come with very cool ipythonnotebooks that contain background, code examples and visualizations. These can be found on our webpage!
Features
 In addition, the following features have been added:
 Added method to importance sample the (true) marginal likelihood of a Gaussian Process using a posterior approximation.
 Added a new class for classical probability distribution that can be sampled and whose logpdf can be evaluated. Added the multivariate Gaussian with various numerical flavours.
 Crossvalidation framework works now with Gaussian Processes
 Added nuSVR for LibSVR class
 Modelselection is now supported for parameters of subkernels of combined kernels in the MKL context. Thanks to Evangelos Anagnostopoulos
 Probability output for multiclass SVMs is now supported using various heuristics. Thanks to Shell Xu Hu.
 Added an "equals" method to all Shogun objects that recursively compares all registered parameters with those of another instance  up to a specified accuracy.
 Added a "clone" method to all Shogun objects that creates a deep copy
 Multiclass LDA. Thanks to Kevin Hughes.
 Added a new datatype, complex128_t, for complex numbers. Math functions, support for SGVector/Matrix, SGSparseVector/Matrix, and serialization with Ascii and Xml files added. [Soumyajit De].
 Added miniframework for numerical integration in one variable. Implemented GaussKronrod and GaussHermite quadrature formulas.
 Changed from configure script to CMake by Viktor Gal.
 Add C++0x and C++11 cmake detection scripts
 NDArray typmap support for python and octave modular.
Bugfixes
Fix json serialization.
Fixed bugs in FITC inference method that caused wrong posterior results.
Fixed bugs in GP Regression that caused negative values for the variances.
Fixed two memory errors in the streamingfeatures framework.
Fixed bug in the Kernel Mean Matching implementation (thanks to Meghana Kshirsagar).
Bugfixes Cleanups and API Changes:
Switch compile system to cmake
SGSparseVector/Matrix are now derived from SGReferenceData and thus refcounted.
Move README and INSTALL files to top level directory.
Use common RefCount class for ReferencedData and CSGObjects.
Rename HMSVMLabels to SequenceLabels
Refactored method to fit a sigmoid to SVM scores, now in CStatistics, still called from CBinaryLabels.
Use Dynamic arrays to hold preprocessors in features instead of raw pointers.
Use Dynamic arrays to hold Features in CombinedFeatures.
Use Dynamic arrays to hold Kernels in CombinedKernels/ProductKernels.
Use Eigen3 for GPs, LDA
October 29, 2013, 18:58:51 2.1.0 This release also contains several enhancements, cleanups and bugfixes:
Features
 Linear Time MMD twosample test now works on streamingfeatures, which allows to perform tests on infinite amounts of data. A block size may be specified for fast processing. The below features were also added. By Heiko Strathmann.
 It is now possible to ask streaming features to produce an instance of streamed features that are stored in memory and returned as a CFeatures* object of corresponding type. See CStreamingFeatures::get_streamed_features().
 New concept of artificial data generator classes: Based on streaming features. First implemented instances are CMeanShiftDataGenerator and CGaussianBlobsDataGenerator. Use above new concepts to get nonstreaming data if desired.
 Accelerated projected gradient multiclass logistic regression classifier by Sergey Lisitsyn.
 New CCSOSVM based structured output solver by Viktor Gal
 A collection of kernel selection methods for MMDbased kernel two sample tests, including optimal kernel choice for single and combined kernels for the linear time MMD. This finishes the kernel MMD framework and also comes with new, more illustrative examples and tests. By Heiko Strathmann.
 Alpha version of Perl modular interface developed by Christian Montanari.
 New framework for unittests based on googletest and googlemock by Viktor Gal. A (growing) number of unittests from now on ensures basic funcionality of our framework. Since the examples do not have to take this role anymore, they should become more ilustrative in the future.
 Changed the core of dimension reduction algorithms to the Tapkee library.
Bugfixes
 Fix for shallow copy of gaussian kernel by Matt Aasted.
 Fixed a bug when using StringFeatures along with kernel machines in crossvalidation which cause an assertion error. Thanks to Eric (yoo)!
 Fix for 3class case training of MulticlassLibSVM reported by Arya Iranmehr that was suggested by Oksana Bayda.
 Fix for wrong Spectrum mismatch RBF construction in static interfaces reported by Nona Kermani.
 Fix for wrong include in SGMatrix causing build fail on Mac OS X (thanks to @bianjiang).
 Fixed a bug that caused kernel machines to return nonsense when using custom kernel matrices with subsets attached to them.
 Fix for parameter dictionary creationg causing dereferencing null pointers with gaussian processes parameter selection.
 Fixed a bug in exact GP regression that caused wrong results.
 Fixed a bug in exact GP regression that produced memory errors/crashes.
 Fix for a bug with static interfaces causing all outputs to be 1/+1 instead of real scores (reported by Kamikawa Masahisa).
Cleanup and API Changes
 SGStringList is now based on SGReferencedData.
 "confidences" in context of CLabel and subclasses are now "values".
 CLinearTimeMMD constructor changes, only streaming features allowed.
 CDataGenerator will soon be removed and replaced by new streaming based classes.
 SGVector, SGMatrix, SGSparseVector, SGSparseVector, SGSparseMatrix refactoring: Now contains load/save routines, relevant functions from CMath, and implementations went to .cpp file.
March 17, 2013, 13:59:34 2.0.0 This release contains several enhancements, cleanups and bugfixes:
Features
 This release contains first release of Efficient Dimensionality Reduction Toolkit (EDRT).
 Support for new SWIG builtin python interface feature (SWIG 2.0.4 is required now).
 EDRT algorithms are now available using static interfaces such as matlab and octave.
 JensenShannon kernel and Homogeneous kernel map preprocessor (thanks to Viktor Gal).
 New 'multiclass' module for multiclass classification algorithms, generic linear and kernel multiclass machines, multiclass LibLinear and OCAS wrappers, new rejection schemes concept by Sergey Lisitsyn.
 Various multitask learning algorithms including L1/Lq multitask group lasso logistic regression and least squares regression, L1/L2 multitask tree guided group lasso logistic regression and least squares regression, trace norm regularized multitask logistic regression, clustered multitask logistic regression and L1/L2 multitask group logistic regression by Sergey Lisitsyn.
 Group and treeguided logistic regression for binary and multiclass problems by Sergey Lisitsyn.
 Mahalanobis distance, QDA, Stochastic Proximity Embedding, generic OvO multiclass machine and CoverTree & KNN integation (thanks to Fernando J. Iglesias Garcia).
 Structured output learning framework by Fernando J. Iglesias Garcia.
 Hidden markov support vector machine structured output model by Fernando J. Iglesias Garcia.
 Implementations of three Bundle method for risk minimization (BMRM) variants by Michal Uricar.
 Latent SVM framework and latent detector example by Viktor Gal.
 Gaussian processes framework for parameters selection and gaussian processes regression estimation framework by Jacob Walker.
 New graphical python modular examples.
 Standard CrossValidation splitting for regression problems by Heiko Strathmann
 New datalocking concept by Heiko Strathmann which allows to tell machines that data is not going to change during training/testing until unlocked. KernelMachines now make use of that by not recomputing kernel matrix in crossvalidation.
 Crossvalidation for KernelMachines is now parallelized.
 Crossvalidation is now possible with custom kernels.
 Features may now have arbritarily many index subsets (of subsets (of subsets (...))).
 Various clustering measures, Least Angle Regression and new multiclass strategies concept (thanks to Chiyuan Zhang).
 A bunch of multiclass learning algorithms including the ShareBoost algorithm, ECOC framework, conditional probability tree, balanced conditional probability tree, random conditional probability tree and relaxed tree by Chiyuan Zhang.
 Python Sparse matrix typemap for octave modular interface (thanks to Evgeniy Andreev).
 Newton SVM port (thanks to Harshit Syal).
 Some progress on native windows compilation using cmake and mingww64 (thanks to Josh aka jklontz).
 CMake compilation improvements (thanks to Eric aka yoo).
Bugfixes
 Fix for bug in the Gaussian Naive Bayes classifier, its domain was changed to logspace.
 Fix for R_static interface installation (thanks Steve Lianoglou).
 SVMOcas memsetting and max_train_time bugfix.
 Various fixes for compile errors with clang.
 Stratifiedcrossvalidation now used different indices for each run.
Cleanup and API Changes
 Various code cleanups by Evan Shelhamer
 Parameter migration framework by Heiko Strathmann. From now on, changes in the shogun objects will not break loading old serialized files anymore
September 5, 2012, 21:57:35 1.1.0 This release contains major enhancements, cleanups and bugfixes:
Features
 New dimensionality reduction algorithms: Diffusion Maps, Kernel Locally Linear Embedding, Kernel Local Tangent Space Alignment, Linear Local Tangent Space Alignment, Neighborhood Preserving embedding, Locality Preserving Projections.
 Various performance improvements for dimensionality reduction methods (BLAS, alignment formulation of the LLE, ..)
 Automatical k determination mode for Locally Linear Embedding dimension reduction method based on reconstruction error.
 ARPACK and SUPERLU integration.
 Introduce the concept of Converters that can embed (arbitrary) feature types into different feature types.
 LibSVM is now pthreadparallelized.
 Create modshogun.dll for csharp.
 Various new c# examples (thanks Daniel Korn).
 Dimensionality reduction examples application is introduced
Bugfixes
 Octave_static and octave_modular examples fix.
 Memory leak in custom kernel is now eliminated (thanks Madeleine Seeland for reporting).
 Fix for linear machine set_w method (thanks Brian Cheung for reporting).
 DotFeatures fix for assert bug.
 FibonacciHeap memory leak fix.
 Fix for Java modular interface typemapping bug.
 Fix errors uncovered by LLVM / clang++.
 Fix for configure on Darwinx86_64 (thanks Peter Romov for patch).
 Improve lua / ruby detection.
 Fix configure / compilation under osx and cygwin for variuos interfaces.
Cleanup and API Changes
 Most of the inline functions have been (re)moved to the corresponding .cpp file
 Libshogun is now being compiled with sse support for math (if available) but interfaces are now being compiled with O0 key which drastically reduces compilation time
December 13, 2011, 05:11:29 1.0.0 This release contains major enhancements, cleanups and bugfixes:
Features
 Support for new languages: java, c#, ruby, lua in modular interfaces (GSoC project of Baozeng Ding)
 Port all examples to the new languages: Ruby examples with example transition tool (thanks to Justin Patera aka serialhex)
 Dimensionality reduction (manifold learning) algorithms are now available. In particular: Locally Linear Embedding (LLE), Hessian Locally Linear Embedding (HLLE), Local Tangent Space Alignment (LTSA), Kernel PCA (kPCA), Multidimensional Scaling (MDS, with possible landmark approximation), Isomap (using Fibonacci Heap Dijkstra for shortest paths), Laplacian Eigenmaps (GSoC project of Sergey Lisitsyn)
 Various new kernels: TStudentKernel, CircularKernel, WaveKernel, SplineKernel, LogKernel, RationalQuadraticKernel, WaveletKernel, BesselKernel, PowerKernel, ExponentialKernel, CauchyKernel, ANOVAKernel, MultiquadricKernel, SphericalKernel, DistantSegmentsKernel (thanks GSoC students for the contributions!)
 Streaming / Online Feature Framework for SimpleFeatures, SparseFeatures, StringFeatures (GSoC project of Shashwat Lal Das)
 SGDQN, Online SGD, Online Liblinear, Online Vowpal Vabit (GSoC project of Shashwat Lal Das)
 Model selection framework for arbitrary Machines (GSoC project of Heiko Strathmann)
 Gaussian Mixture Models (GSoC project of Alesis Novik)
 FibonacciHeap for efficient shortestpath problem solving (thanks to Evgeniy Andreev)
 Efficient HashSet (thanks to Evgeniy Andreev)
 ARPACK wrapper (dseupd) for symmetric eigenproblems (both generalized and nongeneralized), some new LAPACK wrappers (Sergey Lisitsyn)
 New Statistics module for various statistics measures (Heiko Strathmann)
 Subset support to features (Heiko Strathmann)
 Java externalization support (Sergey Lisitsyn)
 Support matlab 2011a.
Bugfixes
 Fix build failure with ld asneeded (thanks Matthias Klose for the patch).
 Fix initialization error in KRR static interfaces (thanks Maxwell Collins for the patch).
Cleanup and API Changes
 Introduce Machine, KernelMachine, LinearMachine, LinearOnlineMachine, DistanceMachine with train() and apply() functions and drop Classifier.
 Restructure source code layout: Merge libshogunui and libshogun into src/shogun and move all interfaces into src/shogun. Split up lib into lib, io and mathematics.
 Create a single 'modshogun' module resembling the functionality found in libshogun. Now octave_modular and other modular interfaces work reliably.
 Introduce SGVector, SGMatrix, SGNDArray, SGStringList for transfering objectpointers and metadata from/to shogun.
 Classes no longer store copies of e.g. matrices, and just pass pointers on set/get operations.
 Stop using new[] / delete[] and switch to SG_MALLOC, SG_CALLOC, SG_REALLOC, SG_FREE macros.
 Preproc renamed to preprocessor, PCACut renamed to PCA
September 1, 2011, 02:09:45 0.10.0 This release contains several enhancements, cleanups and bugfixes:
Features
 Serialization of objects deriving from CSGObject, i.e. all shogun objects (SVM, Kernel, Features, Preprocessors, ...) as ASCII, JSON, XML and HDF5
 Create SVMLightOneClass
 Add CustomDistance in analogy to custom kernel
 Add HistogramIntersectionKernel (thanks Koen van de Sande for the patch)
 Matlab 2010a support
 SpectrumMismatchRBFKernel modular support (thanks Rob Patro for the patch)
 Add ZeroMeanCenterKernelNormalizer (thanks Gorden Jemwa for the patch)
 Swig 2.0 support
Bugfixes
 Custom Kernels can now be > 4G (thanks Koen van de Sande for the patch)
 Set C locale on startup in init_shogun to prevent incompatiblies with ascii floats and fprintf
 Compile fix when reference counting is disabled
 Fix set_position_weights for wd kernel (reported by Dave duVerle)
 Fix set_wd_weights for wd kernel.
 Fix crasher in SVMOcas (reported by Yaroslav)
Cleanup and API Changes
 Renamed SVM_light/SVR_light to SVMLight etc.
 Remove C prefix in front of nonserializable class names
 Drop CSimpleKernel and introduce CDotKernel as its base class. This way all dotproduct based kernels can be applied on top of DotFeatures and only a single implementation for such kernels is needed.
December 7, 2010, 15:35:26 0.9.3 This release contains several enhancements, cleanups and bugfixes:
Features
 Experimental lpnorm MCMKL
 New Kernels: SpectrumRBFKernelRBF, SpectrumMismatchRBFKernel, WeightedDegreeRBFKernel
 WDK kernel supports amino acids
 String Features now support append operations
 pythondbg support
 Allow floats as input for custom kernel (and matrices > 4GB in size)
Bugfixes
 Static linking fix.
 Fix sparse linear kernel's add_to_normal
Cleanup and API Changes
 Remove init() function in Performance Measures
 Adjust .so suffix for python and use python distutils to figure out install paths
May 31, 2010, 15:31:49 0.9.2 This release contains several enhancements, cleanups and bugfixes:
Features
 Direct reading and writing of ASCII/Binary files/HDF5 based files.
 Implemented multi task kernel normalizer.
 Implement SNP kernel.
 Implement time limit for libsvm/libsvr.
 Integrate Elastic Net MKL (thanks Ryoata Tomioka for the patch).
 Implement Hashed WD Features.
 Implement Hashed Sparse Poly Features.
 Integrate liblinear 1.51
 LibSVM can now be trained with bias disabled.
 Add functions to set/get global and local io/parallel/... objects.
Bugfixes
 Fix set_w() for linear classifiers.
 Static Octave, Python, Cmdline and Modular Python interfaces Compile cleanly under Windows/Cygwin again.
 In static interfaces testing could fail when not directly done after training.
March 31, 2010, 00:50:12 0.9.1 This release contains several enhancements, cleanups and bugfixes:
Features
 Integrate LaRank.
 Memory Mapped Features (for data sets that don't fit into memory).
 Compressor module with compression and decompression support for lzo, gzip, bzip2 and lzma.
 Compressed String Features with onthefly decompression (CDecompressString preproc).
 Parallel computation of get_kernel_matrix().
 One may now prefix all shogun print/outputs with file name and line number (obj.io.enable_file_and_line())
 Chinese Documentation thanks Elpmis Lee.
Bugfixes
 Fix One class MKL testing in static interfaces.
 Configure fixes: Let octave not write history on configure; fail when cplex is forcefully enabled but not found; add cplex 12 support.
 Fix a problem with regression and CombinedKernels employing only Custom kernels.
Cleanup and API Changes
 String Features now (like SimpleFeatures) upon get_feature_vector require an additional do_free argument and need to be freed using free_feature_vector.
November 16, 2009, 11:02:41 0.9.0 This release contains several cleanups and enhancements:
Features
 Implement set_linear_classifier for static interfaces.
 Implement Polynomial DotFeatures.
 Implement domain adaptation SVM.
 Speed up ScatterSVM.
 Initial implementation for saving and Loading of shogun objects.
 Examples have been polished/split up into separate files.
 Documentation and webpage improvements.
Bugfixes
 Fix one class MKL for static interfaces.
 Fix performance measures integer overflow.
 Configure fixes to run under OSX's snow leopard.
 Compiles and runs under solaris both using suncc and gcc.
Cleanup and API Changes
 It is no longer necessary to call init_kernel TRAIN/TEST.
 Removed kernel {load,save}_init.
 Removed preproc {load,save}_init.
 Move the mkl code from classifier/svm to classifier/mkl.
 Removed obsolete mindy support.
 Rename MCSVM to ScatterSVM
 Move distributions to distributions/ directory.
 CClassifier::classify() no longer has a label as argument.
 Introduce CClassifier::train(CFeatures * ) and classify(CFeatures *) for more effective training/testing.
 Remove unnecessary global symbols.
October 23, 2009, 14:23:21 0.8.0 This release contains several cleanups, features and bugfixes:
Features
 Implements new multiclass svm formulation.
 1,2 and general qnorm MKL for classification, regression and oneclass for wrapper and chunking algorithm for arbitrary (dual) SVM solvers.
 Dynamic Programming code is now accessible from python.
 Implements Regulatory Modules kernel.
 Documentation updates (Tutorial, improved installation instructions, overview about the implemented algorithms).
Bugfixes
 Correct qnorm MKL for Newton.
 Upon make install of elwms don't install files into R/octave/python if these interfaces were not configured
 Svmnu parameter was not set correctly.
 Fix custom kernel initialization.
 Correct get_subkernel_weights.
 Proper Intel core2 compile flags detection
 Fix number of outputs for KNN.
 Run tests with proper LD_LIBRARY_PATH set.
 Fix several memory leaks.
Cleanup and API Changes
 Rename svm_one_class_nu to svm_nu.
 Clean up dynamic programming code.
 Remove commands from_position_list and slide_window and move functionallity into set/add_features,
 Remove now obsolete legacy examples.
August 16, 2009, 19:53:50 0.7.3 This release contains several cleanups and bugfixes:
Features
 Improve libshogun/developer tutorial.
 Implement convenience function for parallel quicksort.
 Fasta/fastq file loading for StringFeatures.
Bugfixes
 get_name function was undefined in Evaluation causing the PerformanceMeasures class to be defunct.
 Workaround bugs in the std template library for math functions.
 Compiles cleanly under OSX now, thanks to James Kyle.
Cleanup and API Changes
 Make sure that all destructors are declared virtual.
May 2, 2009, 22:45:13 0.7.2 This release contains several cleanups and enhancements:
Features:
 Support all data types from python_modular: dense, scipysparse csc_sparse matrices and strings of type bool, char, (u)int{8,16,32,64}, float{32,64,96}. In addition, individual vectors/strings can now be obtained and even changed. See examples/python_modular/features_*.py for examples.
 AUC maximization now works with arbitrary kernel SVMs.
 Documentation updates, many examples have been polished.
 Slightly speedup Oligo kernel.
Bugfixes:
 Fix reading strings from directory (f.load_from_directory()).
 Update copyright to 2009.
Cleanup and API Changes:
 Remove {Char,Short,Word,Int,Real}Features and only ever use the templated SimpleFeatures.
 Split up examples in examples/python_modular to separate files.
 Now use s.set_features(strs) instead of s.set_string_features(strs) to set string features.
 The meaning of the width parameter for the Oligo Kernel changed, the OligoKernel has been renamed to OligoStringKernel.
March 23, 2009, 10:23:04 0.7.1 This release contains several cleanups, feature enhancements and bugfixes:
Features:
 configure now detects libshogun/ui installed in /usr/(local/)lib if libshogun/ui dirs are removed.
 Improved documentation (and path and doxygen fixes).
 Tutorial on how to develop with libshogun and to extend shogun.
 Added the elwms (eilergendewollmilchsau) interface that is a chimera that in one file interfaces to python,octave,r,matlab and provides the run_{octave,python,r} command to run code in {octave,python,r} from within octave,r,matlab,python transparently making variables available to the target interface avoiding file i/o.
 Implement AttributeFeatures for (attr,value) pairs, trees etc.
Bugfixes:
 fix a crasher occurring with combined kernel and multiple threads.
 configure now allows building of modular interfaces only.
 ndimensional arrays work now in octave.
Cleanup and API Changes:
 Custom Kernel no longer requires features nor initialization, even not when used in CombinedKernel (the combined kernel will skip over custom kernels on init).
March 8, 2009, 20:30:32 0.7.0 This release contains major feature enhancements and bugfixes:
 Implement DotFeatures and CombinedDotFeatures. DotFeatures need to provide dotproduct and similar operations (hence the name). This enables training of linear methods with mixed datatypes (sparse and dense and other even the newly implemented string based SpecFeatures and WDFeatures).
 MKL now does not require CPLEX any longer.
 Add qnorm MKL support based on internal Newton implementation.
 Add 1norm MKL support based on GLPK.
 Add multiclass MKL support based on the GLPK and the GMNP svm solver.
 Implement Tensor Product Pair Kernel (TPPK).
 Support compilation on the iPhone :)
 Add an option to set wds kernel position weights.
 Build static libshogun.a for libshogun target.
 Testsuite can also test the modular R interface, added test for OligoKernel.
 Ocas and WDOcas can be used with a bias feature now.
 Update to LibSVM 2.88.
 Enable parallelized HMM code by default.
February 20, 2009, 10:41:46 0.6.7 Initial Announcement on mloss.org.
October 11, 2007, 21:45:32
Comments

 Tom Fawcett (on January 3, 2011, 03:20:48)
You say, "Some of them come with no less than 10 million training examples, others with 7 billion test examples." I'm not sure what this means. I have problems with mixed symbolic/numeric attributes and the training example sets don't fit in memory. Does SHOGUN require that training examples fit in memory?

 Soeren Sonnenburg (on January 14, 2011, 18:12:01)
Shogun does not necessarily require examples to be in memory (if you use any of the FileFeatures). However, most algorithms within shogun are batch type  so using the non inmemory FileFeatures would probably be very slow.
This does not matter for doing predictions of course, even though the 7 billion test examples above referred to predicting gene starts on the whole human genome (in memory ~3.5GB and a context window of 1200nt was shifted around in that string).
In addition one can compute features (or feature space) onthefly potentially saving lots of memory.
Not sure how big your problem is but I guess this is better discussed on the shogun mailinglist.

 Yuri Hoffmann (on September 14, 2013, 17:12:16)
cannot use the java interface in cygwin (already reported on github) nor in debian.
Leave a comment
You must be logged in to post comments.
In case you find bugs, feel free to report them at http://trac.tuebingen.mpg.de/shogun.