June 2010 archive

3rd MLOSS workshop at ICML 2010

June 29, 2010

On Friday, June 25, 2010, we held our 3rd machine learning open source software workshop at the ICML in Haifa, Israel. All in all, it was a very nice meeting. We again had two very interesting invited speakers, Gary Bradski and Victoria Stodden. This time, we also decided to have only two kinds of presentations: Either a 20 minute talk or a poster presentation with spotlight. In the last meetings, we had longer talks, shorter talks, and poster-only presentations, but we felt that the poster presentations didn't get the attention they deserved. I think the poster spotlights actually worked out quite well.

We opened the workshop with a talk by Gary Bradski, possibly best known in the machine learning community for OpenCV, an open source framework for real time image processing and computer vision. He gave a comprehensive overview of OpenCV and how it is used at Willow Garage, the startup Gary is working for right now to build an open robotics platform called ROS. When asked what he has learned about manging open source projects he admitted that he considers himself a really bad manager, but he has seen time and again that it really boils down to having a few highly motivated, excellent contributors which can make a real difference.

Victoria Stodden gave on talk on how our current scientific landscape with its terabytes of raw data, and complex data analysis procedures poses a real challenge to reproducible research. She sited a few recent incidents of big research programs which ran into trouble after the validity of their results or their methods was questioned (for example, see "Climategate", or this article about a cancer research program). She strongly advocates that data as well as code must be shared much more openly and discusses legal implications and how to overcome them. She also presented results from a survey of NIPS participants on whether they have shared code or data and why. Interestingly, most reasons for not sharing were of a personal nature (for example, lack of time to prepare documents, or fear of getting scooped by competitors), wheres reasons for sharing were mostly motivated by communitarian ideals like advancing the state of science more quickly.

We also had many interesting new projects:

Several projects addressed Java-centric machine learning: Jstacs and Mulan Mulan were broader frameworks, while jblas and UJMP provided fast and flexible matrix libraries for Java.
Learning with graphical models was covered by Libra and FastInf.
This year we again got pretty comprehensive Python based machine learning libraries, Scikit Learn, PyBrain, as well as Shogun, a large kernel based learning library with bindings to many different languages, including Python.
Finally, we also had projects which specializes more, like OpenKernel for kernel learning, multiboost.org, an high-quality implementation of AdaBoost, gidoc, a framework for working with handwritten text recognition, and the Dependency Modelling Toolkit, a project that deals with modelling probabilistic dependencies.

We also had a talk given by prerecorded video, a first for this workshop, which was nevertheless quite well received, mostly because the authors put a lot of effort in their video and presented a good mixture of personal presentations, slides with voice-over and screen-cast-style demos.

While we are pretty happy with the quality of our submissions and the gradual adoption of open source software practices in the machine learning community, we again saw little in terms of integration. Still, common standards have not yet evolved, and there are many similar projects running in parallel. The most common type of interoperability is that of one library providing a wrapper to other functionality, mostly SVM learners.

As a first step towards better exchangability, we also presented our new sister-site mldata.org. It is similar to mloss, but focusses on machine learning data sets. We have now officially launched the website in "beta-mode", so be sure to check it out whenever you have some data you want to share with other researchers, and do not hesitate to give us feedback!

The whole workshop was recorded by the guys from videolecture.net. As soon as the talks are online, we'll let you know.

Till then, here are a few pictures from the workshop.

Latest Thoughts

Archive

June 2010 archive