Open Thoughts

July 2014 archive

Open Machine Learning Workshop

July 28, 2014

Just in case there are people who follow this blog but not John Langford's, there is going to be an open machine learning workshop on 22 August 2014 in MSR, New York, organised by Alekh Agarwal, Alina Beygelzimer, and John Langford.

As it says on John's blog: If you are interested, please email msrnycrsvp at and say “I want to come” so we can get a count of attendees for refreshments.

Machine Learning Distro

July 22, 2014

What would you include a linux distribution to customise it for machine learning researchers and developers? Which are the tools that would cover the needs of 90% of PhD students who aim to do a PhD related to machine learning? How would you customise a mainstream linux distribution to (by default) include packages that would allow the user to quickly be able to do machine learning on their laptop?

There are several communities which have their own custom distribution:

  • Scientific Linux which is based on Red Hat Enterprise Linux is focused making it easy for system administrators of larger organisations. The two big users are FermiLab and CERN who each have their own custom "spin". Because of its experimental physics roots, it does not have a large collection of pre-installed scientific software, but makes it easy for users to install their own.
  • Bio-Linux is at the other end of the spectrum. Based on Ubuntu, it aims to provide a easy to use bioinformatics workstation by including more than 500 bioinformatics programs, including graphical menus to them and sample data for testing them. It is targeted at the end user, with simple instructions for running it Live from DVD or USB, to install it, and to dual boot it.
  • Fedora Scientific is the latest entrant, providing a nice list of numerical tools, visualisation packages and also LaTeX packages. Its documentation lists packages for C, C++, Octave, Python, R and Java. Version control is also not forgotten. A recent summary of Fedora Scientific was written as part of Open Source Week.

It would seem that Fedora Scientific would satisfy the majority of machine learning researchers, since it provides packages for most things already. Some additional tools that may be useful include:

  • tools for managing experiments and collecting results, to make our papers replicable
  • GPU packages for CUDA and OpenCL
  • Something for managing papers for reading, similar to Mendeley
  • Something for keeping track of ideas and to do lists, similar to Evernote

There's definitely tons of stuff that I've forgotten!

Perhaps a good way to start is to have the list of package names useful for the machine learning researcher in some popular package managers such as yum, apt-get, dpkg. Please post your favourite packages in the comments.