Project details for Somoclu

Screenshot Somoclu 1.7.5

by peterwittek - March 1, 2018, 23:30:34 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (19 today), download ( 8 today ), 0 subscriptions

Description:

Somoclu is a C++ tool for training self-organizing maps on large data sets using a massively parallel resources. It relies on OpenMP for multicore execution and it builds on MPI for distributing the workload across the nodes of the cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, Julia, R, and MATLAB interfaces facilitate use in data analysis. The code is released under GNU GPLv3 licence.

Key features:

  • Fast execution by parallelization: OpenMP, MPI, and CUDA are supported.

  • Python, Julia, R, and MATLAB interfaces for the dense multicore CPU kernel.

  • Planar and toroid maps.

  • Rectangular and hexagonal grids.

  • Gaussian and bubble neighborhood functions.

  • Both dense and sparse input data are supported.

  • Large emergent maps of several hundred thousand neurons are feasible.

  • Integration with Databionic ESOM Tools.

Changes to previous version:
  • New: A Makefile for mingw to build on Windows.
  • Changed: PR #94 added a much more efficient sparse kernel.
  • Changed: boilerplate code for Julia greatly improved.
  • Changed: Code cleanup, pre-processor macros simplified.
  • Changed: Adapted to Seaborn API changes in plotting heatmaps.
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Linux, Windows, Macos
Data Formats: Ascii, Libsvm, Esom
Tags: Cuda, Self Organizing Maps, Mpi, Esom, Openmp
Archive: download here

Other available revisons

Version Changelog Date
1.7.5
  • New: A Makefile for mingw to build on Windows.
  • Changed: PR #94 added a much more efficient sparse kernel.
  • Changed: boilerplate code for Julia greatly improved.
  • Changed: Code cleanup, pre-processor macros simplified.
  • Changed: Adapted to Seaborn API changes in plotting heatmaps.
March 1, 2018, 23:30:34
1.7.4
  • New: Verbosity parameter in the command-line, Python, MATLAB, and Julia interfaces.
  • Changed: Calculation of U-matrix parallelized.
  • Changed: Moved feeding data to train method in the Python interface.
  • Fixed: The random seed was set to 0 for testing purposes. This is now changed to a wall-time based initialization.
  • Fixed: Sparse matrix reader made more robust.
  • Fixed: Compatibility with kohonen 3 resolved.
  • Fixed: Compatibility with Matplotlib 2 resolved.
June 6, 2017, 15:48:11
1.7.2
  • New: The coefficient of the Gaussian neighborhood function exp(-||x-y||^2/(2(coeffradius)^2)) is now exposed in all interfaces as a parameter.
  • New: get_bmu function in the Python interface to get the best matching units given an activation map.
  • Changed: Updated PCA initialization in the Python interface to work with sk-learn 0.18 onwards.
  • Changed: Radii can be float values.
  • Fixed: Only positive values were written back to codebook during update.
  • Fixed: Sparse data is read correctly when there are class labels.
November 24, 2016, 22:43:48
1.7.1
  • Fixed: macOS build works again.
October 2, 2016, 10:48:46
1.7.0
  • New: Julia interface is available (https://github.com/peterwittek/Somoclu.jl).
  • New: Method get_surface_state of the Somoclu object in Python calculates the activation map for all data instances.
  • New: Method view_activation_map of the Somoclu object in Python allows plotting the activation map for the training data instances or for a new data instance.
  • New: Method view_similarity_matrix of the Somoclu object in Python visualizes the similarity matrix of data points according to their distance to the nodes in the map.
  • Fixed: CRAN-friendliness improved.
September 30, 2016, 15:08:49
1.6.2
  • Changed: In-place codebook updates when compiled without MPI. This improves update speed and substantially cuts memory use.
  • Changed: Compatible with Visual Studio 15.
  • Fixed: The BMUs returned after training were from before the last epoch. Now another round of BMU search is done.
  • Fixed: Training can continue on the same data in the Python wrapper.
  • Fixed: GPU memory allocation problem on Windows.
August 9, 2016, 14:30:34
1.6.1
  • New: Option for PCA initialization is added to the Python interface.
  • New: Clustering of the codebook with arbitrary clustering algorithm in scikit-learn is now possible in the Python interface.
February 22, 2016, 10:42:47
1.6
  • New: R wrapper integrates with kohonen package.
  • New: MATLAB wrapper integrates with soomtoolbox.
  • New: Better handling of CUDA compilation in the Python interface.
  • Changed: Throws an exception if GPU kernel is requested, but it was compiled without it. The earlier behaviour quietly defaulted to the CPU kernel.
January 11, 2016, 09:40:34
1.5.1
  • New: Neighborhood function can be chosen between Gaussian and bubble.
  • Fixed: R wrapper passes arrays with correct orientation.
  • Fixed: io.cpp is no longer required in the wrappers. An exception is thrown when needed.
December 2, 2015, 08:18:27
1.5
  • New: Python interface has visual capabilities.
  • New: Option for hexagonal grid.
  • New: Option for requesting compact support in updating the map.
  • New: Python, R, and MATLAB interfaces now allow passing an initial codebook.
  • Changed: Reduced memory use in calculating U-matrices.
  • Changed: Build system rebuilt and simplified.
September 30, 2015, 13:27:52
1.4.1
  • Better support for ICC.
  • Faster code when compiling with GCC.
  • Building instructions and documentation improved.
  • Bug fixes: portability for R, using native R random number generator.
January 28, 2015, 13:19:36
1.4
  • Better Windows support.
  • Completed CUDA support for Python and R interfaces.
  • Faster compilation by removing unnecessary flags for nvcc
  • Support for CUDA 6.5.
  • Bug fixes: R version no longer needs separate code.
September 5, 2014, 13:01:14
1.3.1
  • Initial Windows support through GCC on Windows.
  • Better I/O separation for the Python, R, and MATLAB interfaces.
  • Bug fixes: major MPI initialization bug fixed.
April 10, 2014, 06:41:38
1.3
  • Python, R, and MATLAB interfaces added.
  • Learning rate parameter included.
  • Linear and exponential cooling strategies added for radius and learning rate.
  • CLI interface made more user-friendly.
  • Default radius depends on both X and Y of the map.
  • Bug fixes: CUDA build without MPI, best matching unit passing without MPI, coordinate order in best matching unit file.
March 31, 2014, 07:53:05
1.2
  • Massive improvements in OpenMP parallelization.
  • MPI libraries are no longer mandatory.
  • Best matching units are saved.
  • Option for specifying an initial codebook for the map.
  • ESOM .lrn input format added.
  • Parsing of white-space characters corrected.
  • Long-named command line switches for specifying SOM dimensions.
  • Fine-grained control of which interim files to save across epochs
  • Option in Makefile for building shared library.
December 17, 2013, 04:31:05
1.1.2

Toroid maps were added. Initial radius is exposed as a parameter via the command line interface. Formats of codebook and U-matrix export are compatible with Databionic ESOM Tools for advanced visualisation. Bug fixes: codebook update with a compact support was removed, NaN entry no longer appears in U-matrices.

November 28, 2013, 03:20:22
1.0

Initial Announcement on mloss.org.

May 14, 2013, 06:21:13

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.