Project details for Theano

Logo Theano 0.8

by jaberg - March 21, 2016, 20:31:59 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (12 today), download ( 2 today ), 3 subscriptions


Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:

* tight integration with numpy – Use numpy.ndarray in Theano-compiled functions.
* transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.
* symbolic differentiation – Let Theano do your derivatives.
* speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
* dynamic C code generation – Evaluate expressions faster.
* extensive unit-testing and self-verification – Detect and diagnose many types of mistake.

Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal).

Theano has been used primarily to implement large-scale deep learning algorithms. To see how, see the Deep Learning Tutorials (

Changes to previous version:

Theano 0.8 (21th of March, 2016)

We recommend to everyone to upgrade to this version.


* Python 2 and 3 support with the same code base
* Faster optimization
* Integration of CuDNN for better GPU performance
* Many Scan improvements (execution speed up, ...)
* optimizer=fast_compile moves computation to the GPU.
* Better convolution on CPU and GPU. (CorrMM, cudnn, 3d conv, more parameter)
* Interactive visualization of graphs with d3viz
* cnmem (better memory management on GPU)
* BreakpointOp
* Multi-GPU for data parallism via Platoon (
* More pooling parameter supported
* Bilinear interpolation of images
* New GPU back-end:

    * Float16 new back-end (need cuda 7.5)
    * Multi dtypes
    * Multi-GPU support in the same process
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
URL: Project Homepage
Supported Operating Systems: Linux, Macosx, Windows
Data Formats: Agnostic
Tags: Python, Cuda, Gpu, Symbolic Differentiation, Numpy
Archive: download here

Other available revisons

Version Changelog Date

Theano 1.0.2 (23rd of May, 2018)

This is a maintenance release of Theano, version 1.0.2, with no new features, but some important bug fixes.

We recommend that everybody update to this version.

Highlights (since 1.0.1):

  • Theano should work under PyPy now (this is experimental).
  • Update for cuDNN 7.1 RNN API changes.
  • Fix for a crash related to mixed dtypes with cuDNN convolutions.
  • MAGMA should work in more cases without manual config.
  • Handle reductions with non-default accumulator dtype better on the GPU.
  • Improvements to the test suite so that it fails less often due to random chance.

A total of 6 people contributed to this release since 1.0.1:

  • Frederic Bastien
  • Steven Bocco
  • Jon Haygood
  • Arnaud Bergeron
  • Jordan Melendez
  • Desiree Vogt-Lee
  • Garming Sam
  • Pascal Lamblin
  • Vincent Dumoulin
  • Glexin
  • Simon Lefrancois
May 23, 2018, 16:34:31

Theano 1.0.1 (6th of December, 2017)

This is a maintenance release of Theano, version 1.0.1, with no new features, but some important bug fixes.

Highlights (since 1.0.0):

  • Fixed compilation and improved float16 support for topK on GPU

  • NB: topK support on GPU is experimental and may not work for large input sizes on certain GPUs

  • Fixed cuDNN reductions when axes to reduce have size 1

  • Attempted to prevent re-initialization of the GPU in a child process

  • Fixed support for temporary paths with spaces in Theano initialization

  • Spell check pass on the documentation

December 7, 2017, 14:14:38

Theano 1.0.0 (15th of November, 2017)

Highlights (since 0.9.0):

  • Announcing that MILA will stop developing Theano <>_

  • conda packages now available and updated in our own conda channel mila-udem To install it: conda install -c mila-udem theano pygpu

  • Support NumPy 1.13

  • Support pygpu 0.7

  • Moved Python 3.* minimum supported version from 3.3 to 3.4

  • Added conda recipe

  • Replaced deprecated package nose-parameterized with up-to-date package parameterized for Theano requirements

  • Theano now internally uses sha256 instead of md5 to work on systems that forbid md5 for security reason

  • Removed old GPU backend theano.sandbox.cuda. New backend theano.gpuarray is now the official GPU backend

  • Make sure MKL uses GNU OpenMP

  • NB: Matrix dot product (gemm) with mkl from conda could return wrong results in some cases. We have reported the problem upstream and we have a work around that raises an error with information about how to fix it.

  • Improved elemwise operations

  • Speed-up elemwise ops based on SciPy

  • Fixed memory leaks related to elemwise ops on GPU

  • Scan improvements

  • Speed up Theano scan compilation and gradient computation

  • Added meaningful message when missing inputs to scan

  • Speed up graph toposort algorithm

  • Faster C compilation by massively using a new interface for op params

  • Faster optimization step, with new optional destroy handler

  • Documentation updated and more complete

  • Added documentation for RNNBlock

  • Updated conv documentation

  • Support more debuggers for PdbBreakpoint

  • Many bug fixes, crash fixes and warning improvements

November 16, 2017, 17:42:27

Theano 0.9.0 (20th of March, 2017)

Highlights (since 0.8.0):

* Better Python 3.5 support
* Better numpy 1.12 support
* Conda packages for Mac, Linux and Windows
* Support newer Mac and Windows versions
* More Windows integration:

    * Theano scripts (``theano-cache`` and ``theano-nose``) now works on Windows
    * Better support for Windows end-lines into C codes
    * Support for space in paths on Windows

* Scan improvements:

    * More scan optimizations, with faster compilation and gradient computation
    * Support for checkpoint in scan (trade off between speed and memory usage, useful for long sequences)
    * Fixed broadcast checking in scan

* Graphs improvements:

    * More numerical stability by default for some graphs
    * Better handling of corner cases for theano functions and graph optimizations
    * More graph optimizations with faster compilation and execution
    * smaller and more readable graph

* New GPU back-end:

    * Removed warp-synchronous programming to get good results with newer CUDA drivers
    * More pooling support on GPU when cuDNN isn't available
    * Full support of ignore_border option for pooling
    * Inplace storage for shared variables
    * float16 storage
    * Using PCI bus ID of graphic cards for a better mapping between theano device number and nvidia-smi number
    * Fixed offset error in ``GpuIncSubtensor``

* Less C code compilation
* Added support for bool dtype
* Updated and more complete documentation
* Bug fixes related to merge optimizer and shape inference
* Lot of other bug fixes, crashes fixes and warning improvements
April 10, 2017, 20:30:17

Theano 0.8.1 (29th of March, 2016)

* Fix compilation on Mac with CLT 7.3

Theano 0.8 (21th of March, 2016)

We recommend to everyone to upgrade to this version.


* Python 2 and 3 support with the same code base
* Faster optimization
* Integration of CuDNN for better GPU performance
* Many Scan improvements (execution speed up, ...)
* optimizer=fast_compile moves computation to the GPU.
* Better convolution on CPU and GPU. (CorrMM, cudnn, 3d conv, more parameter)
* Interactive visualization of graphs with d3viz
* cnmem (better memory management on GPU)
* BreakpointOp
* Multi-GPU for data parallism via Platoon (
* More pooling parameter supported
* Bilinear interpolation of images
* New GPU back-end:

    * Float16 new back-end (need cuda 7.5)
    * Multi dtypes
    * Multi-GPU support in the same process
April 1, 2016, 19:22:01

Theano 0.8 (21th of March, 2016)

We recommend to everyone to upgrade to this version.


* Python 2 and 3 support with the same code base
* Faster optimization
* Integration of CuDNN for better GPU performance
* Many Scan improvements (execution speed up, ...)
* optimizer=fast_compile moves computation to the GPU.
* Better convolution on CPU and GPU. (CorrMM, cudnn, 3d conv, more parameter)
* Interactive visualization of graphs with d3viz
* cnmem (better memory management on GPU)
* BreakpointOp
* Multi-GPU for data parallism via Platoon (
* More pooling parameter supported
* Bilinear interpolation of images
* New GPU back-end:

    * Float16 new back-end (need cuda 7.5)
    * Multi dtypes
    * Multi-GPU support in the same process
March 21, 2016, 20:31:59

Theano 0.7 (26th of March, 2015)

We recommend to everyone to upgrade to this version.


* Integration of CuDNN for 2D convolutions and pooling on supported GPUs
* Too many optimizations and new features to count
* Various fixes and improvements to scan
* Better support for GPU on Windows
* On Mac OS X, clang is used by default
* Many crash fixes
* Some bug fixes as well
March 27, 2015, 16:40:18

Theano 0.6 (December 3th, 2013)


* Last release with support for Python 2.4 and 2.5.
* We will try to release more frequently.
* Fix crash/installation problems.
* Use less memory for conv3d2d.

0.6rc4 skipped for a technical reason.

Highlights (since 0.6rc3):

* Python 3.3 compatibility with buildbot test for it.
* Full advanced indexing support.
* Better Windows 64 bit support.
* New profiler.
* Better error messages that help debugging.
* Better support for newer NumPy versions (remove useless warning/crash).
* Faster optimization/compilation for big graph.
* Move in Theano the Conv3d2d implementation.
* Better SymPy/Theano bridge: Make an Theano op from SymPy expression and use SymPy c code generator.
* Bug fixes.

Too much changes in 0.6rc1, 0.6rc2 and 0.6rc3 to list here. See for details.

December 3, 2013, 20:32:02

Theano 0.5 (23 February 2012)


  • Moved to github:
  • Old trac ticket moved to assembla ticket:
  • Theano vision: (Many people)
  • Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
  • Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm and dot(vector, vector). (James, Frederic, Pascal)
  • C implementation of Alloc. (James, Pascal)
  • theano.grad() now also work with sparse variable. (Arnaud)
  • Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
  • See the Interface changes.

Interface Behavior Changes:

  • The current default value of the parameter axis of theano.{max,min,argmax,argmin,max_and_argmax} is now the same as numpy: None. i.e. operate on all dimensions of the tensor. (Frederic Bastien, Olivier Delalleau) (was deprecated and generated a warning since Theano 0.3 released Nov. 23rd, 2010)
  • The current output dtype of sum with input dtype [u]int* is now always [u]int64. You can specify the output dtype with a new dtype parameter to sum. The output dtype is the one using for the summation. There is no warning in previous Theano version about this. The consequence is that the sum is done in a dtype with more precision than before. So the sum could be slower, but will be more resistent to overflow. This new behavior is the same as numpy. (Olivier, Pascal)
  • When using a GPU, detect faulty nvidia drivers. This was detected when running Theano tests. Now this is always tested. Faulty drivers results in in wrong results for reduce operations. (Frederic B.)

Interface Features Removed (most were deprecated):

  • The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They were accepted only by theano.function(). Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
  • tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt (list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
  • A few tag.shape and Join.vec_length left have been removed. (Frederic)
  • The .value attribute of shared variables is removed, use shared.set_value() or shared.get_value() instead. (Frederic)
  • Theano config option "home" is not used anymore as it was redundant with "base_compiledir". If you use it, Theano will now raise an error. (Olivier D.)
  • scan interface changes: (Razvan Pascanu)
    • The use of return_steps for specifying how many entries of the output to return has been removed. Instead, apply a subtensor to the output returned by scan to select a certain slice.
    • The inner function (that scan receives) should return its outputs and updates following this order: [outputs], [updates], [condition]. One can skip any of the three if not used, but the order has to stay unchanged.

Interface bug fix:

  • Rop in some case should have returned a list of one Theano variable, but returned the variable itself. (Razvan)

New deprecation (will be removed in Theano 0.6, warning generated if you use them):

  • tensor.shared() renamed to tensor._shared(). You probably want to call theano.shared() instead! (Olivier D.)

Bug fixes (incorrect results):

  • On CPU, if the convolution had received explicit shape information, they where not checked at runtime. This caused wrong result if the input shape was not the one expected. (Frederic, reported by Sander Dieleman)
  • Theoretical bug: in some case we could have GPUSum return bad value. We were not able to reproduce this problem
    • patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim): 01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
  • div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
  • theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value. The grad is now disabled and returns an error. (Frederic)
  • An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)" and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your code was affected by this bug. (Olivier, reported by Sander Dieleman)
  • When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]), an optimization replacing it with a direct indexing (x[d]) used an incorrect formula, leading to incorrect results. (Pascal, reported by Razvan)
  • The tile() function is now stricter in what it accepts to allow for better error-checking/avoiding nonsensical situations. The gradient has been disabled for the time being as it only implemented (incorrectly) one special case. The reps argument must be a constant (not a tensor variable), and must have the same length as the number of dimensions in the x argument; this is now checked. (David)

Scan fixes:

  • computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan) before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug) now : do the right thing.
  • gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan) before : it used to return wrong values now : do the right thing. Note: The reported case of this bug was happening in conjunction with the save optimization of scan that give run time errors. So if you didn't manually disable the same memory optimization (number in the list4), you are fine if you didn't manually request multiple taps.
  • Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan) before : compilation error when computing R-op now : do the right thing.
  • save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan) before : for certain corner cases used to result in a runtime shape error now : do the right thing.
  • Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
  • Scan.infer_shape now works correctly when working with a condition for the number of loops. In the past, it returned n_steps as the length, which is not always true. (Razvan)
  • Scan.infer_shape crash fix. (Razvan)

New features:

  • AdvancedIncSubtensor grad defined and tested (Justin Bayer)
  • Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
  • tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
  • Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
  • theano-cache list: list the content of the theano cache (Frederic)
  • theano-cache unlock: remove the Theano lock (Olivier)
  • tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
  • MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
    • used by tensor.{max,min,max_and_argmax}
  • tensor.{all,any} (Razvan)
  • tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
  • Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
  • IfElse now allows to have a list/tuple as the result of the if/else branches.
    • They must have the same length and corresponding type (Razvan)
  • Argmax output dtype is now int64 instead of int32. (Olivier)
  • Added the element-wise operation arccos. (Ian)
  • Added sparse dot with dense grad output. (Yann Dauphin)
    • Optimized to Usmm and UsmmCscDense in some case (Yann)
    • Note: and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs. The new has a dense gradient for all inputs.
  • GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
  • TensorVariable.zeros_like() and SparseVariable.zeros_like()
  • theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
  • theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
  • Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
    • We also support the "theano_version" substitution.
  • IntDiv c code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
  • Internal filter_variable mechanism in Type. (Pascal, Ian)
    • Ifelse works on sparse.
    • It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
  • Added a_tensor.transpose(axes) axes is optional (James)
    • theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
  • a_CudaNdarray_object[*] = int, now works (Frederic)
  • tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
  • sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
  • sparse_variable[N, N] now works (Li Yao, Frederic)
  • sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal) M, N, O, and P can be Python int or scalar tensor variables, None, or omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
  • tensor.tensordot can now be moved to GPU (Sander Dieleman, Pascal, based on code from Tijmen Tieleman's gnumpy,
  • Many infer_shape implemented on sparse matrices op. (David W.F.)
  • Added theano.sparse.verify_grad_sparse to easily allow testing grad of sparse op. It support testing the full and structured gradient.
  • The keys in our cache now store the hash of constants and not the constant values themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
  • 'theano-cache list' lists key files bigger than 1M (Frederic B.)
  • 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
  • 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
  • The Theano flag "nvcc.fastmath" is now also used for the file.
  • Add the header_dirs to the hard part of the compilation key. This is currently used only by cuda, but if we use library that are only headers, this can be useful. (Frederic B.)
  • The Theano flag "nvcc.flags" is now included in the hard part of the key. This mean that now we recompile all modules for each value of "nvcc.flags". A change in "nvcc.flags" used to be ignored for module that were already compiled. (Frederic B.)
  • Alloc, GpuAlloc are not always pre-computed (constant_folding optimization) at compile time if all their inputs are constant. (Frederic B., Pascal L., reported by Sander Dieleman)
  • New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)

New optimizations:

  • AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
  • dot22, dot22scalar work with complex. (Frederic)
  • Generate Gemv/Gemm more often. (James)
  • Remove scan when all computations can be moved outside the loop. (Razvan)
  • scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
  • exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
  • Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
  • Made the optimization process faster. (James)
  • Allow fusion of elemwise when the scalar op needs support code. (James)
  • Better opt that lifts transpose around dot. (James)

Crashes fixed:

  • T.mean crash at graph building time. (Ian)
  • "Interactive debugger" crash fix. (Ian, Frederic)
  • Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
  • Optimization crash with gemm and complex. (Frederic)
  • GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
  • Compilation crash with amdlibm and the GPU. (Frederic)
  • IfElse crash. (Frederic)
  • Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
  • GPU compilation crash on MacOS X. (Olivier)
  • Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
  • When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
  • Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
  • Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
  • Fix dot22scalar cast of integer scalars (Justin Bayer, Frederic, Olivier)
  • Fix runtime crash in gemm, dot22. FB
  • Fix on 32bits computer: make sure all shape are int64.(Olivier)
  • Fix to deque on python 2.4 (Olivier)
  • Fix crash when not using c code (or using DebugMode) (not used by default) with numpy 1.6*. Numpy has a bug in the reduction code that made it crash. (Pascal)
  • Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU) when matrices had non-unit stride in both dimensions (CPU and GPU), or when matrices had negative strides (GPU only). In those cases, we are now making copies. (Pascal)
  • More cases supported in AdvancedIncSubtensor1. (Olivier D.)
  • Fix crash when a broadcasted constant was used as input of an elemwise Op and needed to be upcasted to match the op's output. (Reported by John Salvatier, fixed by Pascal L.)
  • Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)

Known bugs:

  • CAReduce with nan in inputs don't return the good output (Ticket <>_).
    • This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.


  • cvm interface more consistent with current linker. (James)
  • Now all tests pass with the linker=cvm flags.
  • vm linker has a callback parameter. (James)
  • review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
  • review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
  • review/finish/doc: MatrixInverse, matrix_inverse. (Razvan)
  • review/finish/doc: matrix_dot. (Razvan)
  • review/finish/doc: det (determinent) op. (Philippe Hamel)
  • review/finish/doc: Cholesky determinent op. (David)
  • review/finish/doc: ensure_sorted_indices. (Li Yao)
  • review/finish/doc: spectral_radius_boud. (Xavier Glorot)
  • review/finish/doc: sparse sum. (Valentin Bisson)
  • review/finish/doc: Remove0 (Valentin)
  • review/finish/doc: SquareDiagonal (Eric)

Sandbox New features (not enabled by default):

  • CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
  • New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)


  • Many updates. (Many people)
  • Updates to install doc on MacOS. (Olivier)
  • Updates to install doc on Windows. (David, Olivier)
  • Doc on the Rop function (Ian)
  • Added how to use scan to loop with a condition as the number of iteration. (Razvan)
  • Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
  • Refactored GPU installation of Theano. (Olivier)


  • Better error messages in many places. (Many people)
  • PEP8 fixes. (Many people)
  • Add a warning about numpy bug when using advanced indexing on a tensor with more than 232 elements (the resulting array is not correctly filled and ends with zeros). (Pascal, reported by David WF)
  • Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan)
  • New min_informative_str() function to print graph. (Ian)
  • Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier)
  • Better support for utf string. (David)
  • Fix pydotprint with a function compiled with a ProfileMode (Frederic)
    • Was broken with change to the profiler.
  • Warning when people have old cache entries. (Olivier)
  • More tests for join on the GPU and CPU. (Frederic)
  • Do not request to load the GPU module by default in scan module. (Razvan)
  • Fixed some import problems. (Frederic and others)
  • Filtering update. (James)
  • On Windows, the default compiledir changed to be local to the computer/user and not transferred with roaming profile. (Sebastian Urban)
  • New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior): it prints a warning when an error occurs when inferring the shape of some apply node. The other accepted value is "raise" to raise an error when this happens. (Frederic)
  • The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
  • better pycuda tests (Frederic)
  • now accept the shape and the number of iteration as parameter (Frederic)
  • Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
  • More internal verification on what each op.infer_shape return. (Frederic, James)
  • Argmax dtype to int64 (Olivier)
  • Improved docstring and basic tests for the Tile Op (David).
February 23, 2012, 23:14:38

Modifications in the 0.4.1 (12 August 2011)

New features:

  • R_op <>_ macro like theano.tensor.grad

  • Not all tests are done yet (TODO)

  • Added alias theano.tensor.bitwise_{and,or,xor,not}. They are the numpy names.

  • Updates returned by Scan (you need to pass them to the theano.function) are now a new Updates class. That allow more check and easier work with them. The Updates class is a subclass of dict

  • Scan can now work in a "do while" loop style.

  • We scan until a condition is met.

  • There is a minimum of 1 iteration(can't do "while do" style loop)

  • The "Interactive Debugger" (compute_test_value theano flags)

  • Now should work with all ops (even the one with only C code)

  • In the past some errors were caught and re-raised as unrelated errors (ShapeMismatch replaced with NotImplemented). We don't do that anymore.

  • The new Op.make_thunk function(introduced in 0.4.0) is now used by constant_folding and DebugMode

  • Added A_TENSOR_VARIABLE.astype() as a way to cast. NumPy allows this syntax.

  • New BLAS GER implementation.

  • Insert GEMV more frequently.

  • Added new ifelse(scalar condition, rval_if_true, rval_if_false) Op.

  • This is a subset of the elemwise switch (tensor condition, rval_if_true, rval_if_false).

  • With the new feature in the sandbox, only one of rval_if_true or rval_if_false will be evaluated.


  • Subtensor has C code

  • {Inc,Set}Subtensor has C code

  • ScalarFromTensor has C code

  • dot(zeros,x) and dot(x,zeros)

  • IncSubtensor(x, zeros, idx) -> x

  • SetSubtensor(x, x[idx], idx) -> x (when x is a constant)

  • subtensor(alloc,...) -> alloc

  • Many new scan optimization

  • Lower scan execution overhead with a Cython implementation

  • Removed scan double compilation (by using the new Op.make_thunk mechanism)

  • Certain computations from the inner graph are now Pushed out into the outer graph. This means they are not re-comptued at every step of scan.

  • Different scan ops get merged now into a single op (if possible), reducing the overhead and sharing computations between the two instances


  • PyCUDA/CUDAMat/Gnumpy/Theano bridge and documentation <>_.

  • New function to easily convert pycuda GPUArray object to and from CudaNdarray object

  • Fixed a bug if you crated a view of a manually created CudaNdarray that are view of GPUArray.

  • Removed a warning when nvcc is not available and the user did not requested it.

  • renamed config option cuda.nvccflags -> nvcc.flags

  • Allow GpuSoftmax and GpuSoftmaxWithBias to work with bigger input.

Bugs fixed:

  • In one case an AdvancedSubtensor1 could be converted to a GpuAdvancedIncSubtensor1 insted of GpuAdvancedSubtensor1. It probably didn't happen due to the order of optimizations, but that order is not guaranteed to be the same on all computers.
  • Derivative of set_subtensor was wrong.
  • Derivative of Alloc was wrong.

Crash fixed:

  • On an unusual Python 2.4.4 on Windows

  • When using a C cache copied from another location

  • On Windows 32 bits when setting a complex64 to 0.

  • Compilation crash with CUDA 4

  • When wanting to copy the compilation cache from a computer to another

  • This can be useful for using Theano on a computer without a compiler.

  • GPU:

  • Compilation crash fixed under Ubuntu 11.04

  • Compilation crash fixed with CUDA 4.0

Know bug:

  • CAReduce with nan in inputs don't return the good output (Ticket <>_).

  • This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.

  • This is not a new bug, just a bug discovered since the last release that we didn't had time to fix.

Deprecation (will be removed in Theano 0.5, warning generated if you use them):

  • The string mode (accepted only by theano.function()) FAST_RUN_NOGC. Use Mode(linker='c|py_nogc') instead.

  • The string mode (accepted only by theano.function()) STABILIZE. Use Mode(optimizer='stabilize') instead.

  • scan interface change:

  • The use of return_steps for specifying how many entries of the output scan has been depricated

    • The same thing can be done by applying a subtensor on the output return by scan to select a certain slice
  • The inner function (that scan receives) should return its outputs and updates following this order:

    [outputs], [updates], [condition]. One can skip any of the three if not
    used, but the order has to stay unchanged.
  • tensor.grad(cost, wrt) will return an object of the "same type" as wrt (list/tuple/TensorVariable).

  • Currently tensor.grad return a type list when the wrt is a list/tuple of more then 1 element.

Decrecated in 0.4.0(Reminder, warning generated if you use them):

  • Dividing integers with / is deprecated: use // for integer division, or cast one of the integers to a float type if you want a float result (you may also change this behavior with config.int_division).
  • tag.shape attribute deprecated (#633)
  • CudaNdarray_new_null is deprecated in favour of CudaNdarray_New


  • MRG random generator now implements the same casting behavior as the regular random generator.

Sandbox New features(not enabled by default):

  • New Linkers (theano flags linker={vm,cvm})

  • The new linker allows lazy evaluation of the new ifelse op, meaning we compute only the true or false branch depending of the condition. This can speed up some types of computation.

  • Uses a new profiling system (that currently tracks less stuff)

  • The cvm is implemented in C, so it lowers Theano's overhead.

  • The vm is implemented in python. So it can help debugging in some cases.

  • In the future, the default will be the cvm.

  • Some new not yet well tested sparse ops: theano.sparse.sandbox.{SpSum, Diag, SquareDiagonal, ColScaleCSC, RowScaleCSC, Remove0, EnsureSortedIndices, ConvolutionIndices}


  • How to compute the Jacobian, Hessian, Jacobian times a vector, Hessian times a vector <>_.
  • Slide for a 3 hours class with exercises that was done at the HPCS2011 Conference in Montreal.


  • Logger name renamed to be consistent.

  • Logger function simplified and made more consistent.

  • Fixed transformation of error by other not related error with the compute_test_value Theano flag.

  • Compilation cache enhancements.

  • Made compatible with NumPy 1.6 and SciPy 0.9

  • Fix tests when there was new dtype in NumPy that is not supported by Theano.

  • Fixed some tests when SciPy is not available.

  • Don't compile anything when Theano is imported. Compile support code when we compile the first C code.

  • Python 2.4 fix:

  • Fix the file theano/misc/

  • For python 2.4.4 on Windows, replaced float("inf") with numpy.inf.

  • Removes useless inputs to a scan node

  • Beautification mostly, making the graph more visible. Such inputs would appear as a consequence of other optimizations


  • there is a new mechanism that lets an Op permit that one of its inputs to be aliased to another destroyed input. This will generally result in incorrect calculation, so it should be used with care! The right way to use it is when the caller can guarantee that even if these two inputs look aliased, they actually will never overlap. This mechanism can be used, for example, by a new alternative approach to implementing Scan. If an op has an attribute called "destroyhandler_tolerate_aliased" then this is what's going on. IncSubtensor is thus far the only Op to use this mechanism.Mechanism
August 12, 2011, 22:03:21

Change in output memory storage for Ops: If you implemented custom Ops, with either C or Python implementation, this will concern you.

The contract for memory storage of Ops has been changed. In particular, it is no longer guaranteed that output memory buffers are either empty, or allocated by a previous execution of the same Op.

Right now, here is the situation:

* For Python implementation (perform), what is inside output_storage
  may have been allocated from outside the perform() function, for
  instance by another node (e.g., Scan) or the Mode. If that was the
  case, the memory can be assumed to be C-contiguous (for the moment).
* For C implementations (c_code), nothing has changed yet.

In a future version, the content of the output storage, both for Python and C versions, will either be NULL, or have the following guarantees:

* It will be a Python object of the appropriate Type (for a Tensor variable,
  a numpy.ndarray, for a GPU variable, a CudaNdarray, for instance)
* It will have the correct number of dimensions, and correct dtype

However, its shape and memory layout (strides) will not be guaranteed.

When that change is made, the config flag DebugMode.check_preallocated_output will help you find implementations that are not up-to-date.


* tag.shape attribute deprecated (#633)
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
* Dividing integers with / is deprecated: use // for integer division, or
  cast one of the integers to a float type if you want a float result (you may
  also change this behavior with config.int_division).
* Removed (already deprecated) sandbox/compile module
* Removed (already deprecated) incsubtensor and setsubtensor functions,
  inc_subtensor and set_subtensor are to be used instead.

Bugs fixed:

* In CudaNdarray.__{iadd,idiv}__, when it is not implemented, return the error.
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Trying to compute x % y with one or more arguments being complex now
  raises an error.
* The output of random samples computed with uniform(..., dtype=...) is
  guaranteed to be of the specified dtype instead of potentially being of a
  higher-precision dtype.
* The perform() method of DownsampleFactorMax did not give the right result
  when reusing output storage. This happen only if you use the Theano flags 
  'linker=c|py_nogc' or manually specify the mode to be 'c|py_nogc'.

Crash fixed:

* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
* Some optimizations crashed when the "ShapeOpt" optimization was disabled.


* Optimize all subtensor followed by subtensor.


* Move to the gpu fused elemwise that have other dtype then float32 in them
  (except float64) if the input and output are float32.
* This allow to move elemwise comparisons to the GPU if we cast it to
  float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
  memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
* Multinomial random variates now available on GPU

New features:

* ProfileMode
    * profile the scan overhead
    * simple hook system to add profiler
    * reordered the output to be in the order of more general to more specific
* DebugMode now checks Ops with different patterns of preallocated memory,
  configured by config.DebugMode.check_preallocated_output.
* var[vector of index] now work, (grad work recursively, the firect grad
  work inplace, gpu work)
    * limitation: work only of the outer most dimensions.
* New way to test the graph as we build it. Allow to easily find the source
  of shape mismatch error:
* cuda.root inferred if nvcc is on the path, otherwise defaults to
* Better graph printing for graphs involving a scan subgraph
* Casting behavior can be controlled through config.cast_policy,
  new (experimental) mode.
* Smarter C module cache, avoiding erroneous usage of the wrong C
  implementation when some options change, and avoiding  recompiling the
  same module multiple times in some situations.
* The "theano-cache clear" command now clears the cache more thoroughly.
* More extensive linear algebra ops (CPU only) that wrap scipy.linalg
  now available in the sandbox.
* CUDA devices 4 - 16 should now be available if present.
* infer_shape support for the View op, better infer_shape support in Scan
* infer_shape supported in all case of subtensor
* tensor.grad now gives an error by default when computing the gradient
  wrt a node that is disconnected from the cost (not in the graph, or
  no continuous path from that op to the cost).
* New tensor.isnan and isinf functions.


* Better commenting of
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
* Better documentation of testing on Windows
* Better documentation of the 'run_individual_tests' script

Unit tests:

* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
* Better scripts to run tests individually or in batches
* Some tests are now run whenever cuda is available and not just when it has
  been enabled before
* Tests display less pointless warnings.


* Correctly put the broadcast flag to True in the output var of
  a Reshape op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default, option to print
  more compact node names.
* pydotprint: How trunk label that are too long.
* More compact printing (ignore leading "Composite" in op names)
July 14, 2011, 16:47:03


 * The theano shared variable attribute `value` is deprecated, use `get_value()` or `set_value()`

Bugs fixed:

* The random number generator in theano/sandbox/ did not always return the same sequence of number on the CPU and GPU.
* In python mode (not the default mode) when input of elemwise operation was an empty ndarray, we were not returning an empty ndarray.
* Scan cached the number of steps.
* In GpuConv, errors in conv_patch_stack_reduce when the entire kernel doesn't fit into shared memory.
* Implemented some cases that previously triggered exceptions.


* Minor optimizations.
* cuda_shared.value = X now works inplace to save memory.
* Allow to create a CudaNdarraySharedVariable from a CudaNdarray.
* New init_gpu_device theano flags.
* Fuse GpuElemwise more often.
* CPU join of only 1 element that was not moved to the GPU.

New features:

* tensor.reshape now makes dimensions of length 1 broadcastable.
* now implements the gradient.
* Sparse.structured_dot now works when both matrices are sparse
* Sparse type is now supported by the shape op, and the ShapeFeature optimizer works correctly with them.
* New 3D convolution ops, with CPU and GPU implementations.
* New colors in pydotprint.
March 7, 2011, 06:55:14

This is the first major release of Theano since 0.1. Version 0.2 development started internally but it was never advertised as a release.

There have been so many changes since 0.1 that we have lost track of many of them. Below is a partial list of changes since 0.1.

  • GPU code using NVIDIA's CUDA framework is now generated for many Ops.
  • Some interface changes since 0.1:
    • A new "shared variable" system to allow reusing memory space between Theano functions.
      • A new memory contract has been formally written for Theano, for people who want to minimize memory copies.
    • The old module system has been deprecated.
    • By default, inputs to a Theano function will not be silently downcasted (e.g. from float64 to float32).
    • An error is now raised when using the result of logical operation on Theano variable in an 'if' (i.e. an implicit call to nonzeros).
    • An error is now raised when we receive a non-aligned ndarray as input to a function (this is not supported).
    • An error is raised when the list of dimensions passed to dimshuffle() contains duplicates or is otherwise not sensible.
    • Call NumPy BLAS bindings for gemv operations in addition to the already supported gemm.
    • If gcc is unavailable at import time, Theano now falls back to a Python-based emulation mode after raising a warning.
    • An error is now raised when tensor.grad is called on a non-scalar Theano variable (in the past we would implicitly do a sum on the tensor to make it a scalar).
    • Added support for "erf" and "erfc" functions.
  • The current default value of the parameter axis of theano.{max,min,argmax,argmin,max_and_argmax} is deprecated. We now use the default NumPy behavior of operating on the entire tensor.
  • Theano is now available from PyPI and installable through "easy_install" or "pip".
November 23, 2010, 21:42:14

Initial Announcement on

March 20, 2010, 15:58:17


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.