Project details for SGD

Logo SGD 2.0

by leonbottou - October 11, 2011, 20:59:41 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (1 today), download ( 0 today ), 8 comments, 0 subscriptions

OverallWhole StarWhole StarWhole StarWhole StarEmpty Star
FeaturesWhole StarWhole StarWhole StarEmpty StarEmpty Star
UsabilityWhole StarWhole StarWhole StarWhole Star1/2 Star
DocumentationWhole StarWhole StarWhole StarWhole Star1/2 Star
(based on 2 votes)

Learning algorithms based on Stochastic Gradient approximations are known for their poor performance on optimization tasks and their extremely good performance on machine learning tasks (Bottou and Bousquet, 2008). Despite these proven capabilities, there were lingering concerns about the difficulty of setting the adaptation gains and achieving robust performance. Stochastic gradient algorithms have been historically associated with back-propagation algorithms in multilayer neural networks, which can be very challenging non-convex problems. Stochastic gradient algorithms are also notoriously hard to debug because they often appear to somehow work despite the bugs. Experimenters are then led to believe, incorrectly, that the algorithm itself is flawed.

Therefore it is useful to see how Stochastic Gradient Descent performs on simple linear and convex problems such as linear Support Vector Machines (SVMs) or Conditional Random Fields (CRFs). This page proposes simple code examples illustrating the good properties of stochastic gradient descent algorithms. The provided source code values clarity over speed.

The second major release of this code includes a robust implementation of the averaged stochastic gradient descent algorithm (Ruppert, 1988) which consists of performing stochastic gradient descent iterations and simultaneously averaging the parameter vectors over time. When the stochastic gradient gains decrease with an appropriately slow schedule, Polyak and Juditsky (1992) have shown that the algorithm converges like a second-order stochastic gradient descent but with much smaller computational costs. One can therefore hope to match the batch optimization performance after a single pass on the randomly shuffled training set (Fabian, 1978; Bottou and LeCun, 2004). Achieving one-pass learning in practice remains difficult because one often needs more than one pass to simply reach this favorable asymptotic regime. The gain schedule has a deep impact on this convergence. Finer analyses (Xu, 2010; Bach and Moulines, 2011) reveal useful guidelines to set these learning rates. Xu (2010) also describe a wonderful way to efficiently perform the averaging operation when the training data is sparse. The resulting algorithm reaches near-optimal test set performance after only a couple passes.

Changes to previous version:

Version 2.0 features ASGD.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Cygwin, Linux, Macosx, Windows
Data Formats: Various
Tags: Large Scale, Svm, Crf, Stochastic Gradient Descent
Archive: download here

Other available revisons

Version Changelog Date

Version 2.0 features ASGD.

October 11, 2011, 20:59:41

Initial Announcement on

November 14, 2007, 14:17:51


Leon Bottou (on September 23, 2008, 21:42:34)
Version 1.2 fixes bug in the preprocessing program.
Olivier Grisel (on January 26, 2009, 14:33:55)
Apparently the download link of this mloss entry points to the 1.1 version while the description mentions version 1.2.
Olivier Grisel (on January 26, 2009, 15:07:28)
Also, to build it with gcc 4.3.1 I had to explicitly add: #include \ in file lib/pstream.cpp to find the memcpy function definition and I also have the following (harmless) deprecation warning on the hash_map: g++ -g -O2 -Wall -I../lib -c -o preprocess.o preprocess.cpp In file included from /usr/include/c++/4.3/ext/hash_map:64, from preprocess.cpp:38: /usr/include/c++/4.3/backward/backward_warning.h:33:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated.
Leon Bottou (on January 30, 2009, 13:40:40)
According to posix and ansi c, function memcpy is defined in header . See . If your compilation needs memory.h, then something must be wrong with your compiler. Regarding the warning, it suggests to replace the gcc specific hash_map by an unordered_map. This is a c++0x extension. If you do the change, you'll get the following error: /usr/include/c++/4.3/c++0x_warning.h:36:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
Olivier Grisel (on January 30, 2009, 23:27:24)
Indeed, but neither pstream.cpp nor pstream.h include string.h either. Including string.h instead of memory.h as mentioned previously makes it build as expected.
Leon Bottou (on January 31, 2009, 14:01:59)
Ooops. I thought that was already included in lib/pstream.cpp. Here is the patch (to be included in a next release) Index: pstream.cpp =================================================================== RCS file: /home/cvs/cvsroot/sgd/lib/pstream.cpp,v retrieving revision 1.4 retrieving revision 1.5 diff -u -4 -p -r1.4 -r1.5 --- pstream.cpp 2 Oct 2007 20:40:05 -0000 1.4 +++ pstream.cpp 31 Jan 2009 12:28:44 -0000 1.5 @@ -15,13 +15,14 @@ // You should have received a copy of the GNU General Public License // along with this program; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, USA -// $Id: pstream.cpp,v 1.4 2007/10/02 20:40:05 cvs Exp $ +// $Id: pstream.cpp,v 1.5 2009/01/31 12:28:44 cvs Exp $ #include "pstream.h" -#include +#include +#include pstreambuf* pstreambuf::open( const char *cmd, int open_mode)
Leon Bottou (on January 31, 2009, 14:18:42)
Updated in version 1.3
Leon Bottou (on October 11, 2011, 21:01:16)
Released sgd-2.0 featuring ASGD.

Leave a comment

You must be logged in to post comments.