Workshop on Machine Learning Open Source Software 2010
Important Dates
- Submission Date: April 23rd, Samoa time closed
- Notification of Acceptance: May 8th, 2010
- Workshop date: June 25th, 2010
Description
We believe that the wide-spread adoption of open source software policies will have a tremendous impact on the field of machine learning. The goal of this workshop is to further support the current developments in this area and give new impulses to it. Following the success of the inaugural NIPS-MLOSS workshop held at NIPS 2006, the Journal of Machine Learning Research (JMLR) has started a new track for machine learning open source software initiated by the workshop's organizers. Many prominent machine learning researchers have co-authored a position paper advocating the need for open source software in machine learning. To date 11 machine learning open source software projects have been published in JMLR. Furthermore, the workshop's organizers have set up a community website mloss.org where people can register their software projects, rate existing projects and initiate discussions about projects and related topics. This website currently lists 221 such projects including many prominent projects in the area of machine learning.
The main goal of this workshop is to bring the main practitioners in the area of machine learning open source software together in order to initiate processes which will help to further improve the development of this area. In particular, we have to move beyond a mere collection of more or less unrelated software projects and provide a common foundation to stimulate cooperation and interoperability between different projects. An important step in this direction will be a common data exchange format such that different methods can exchange their results more easily.
This year's workshop sessions will consist of two parts.
- We have two invited speakers: Gary Bradski and Victoria Stodden.
- Researchers are invited to submit their open source project to present it at the workshop.
- In discussion sessions, important questions regarding the future development of this area will be discussed. In particular, we will discuss what makes a good machine learning software project and how to improve interoperability between programs. In addition, the question of how to deal with data sets and reproducibility will also be addressed.
Taking advantage of the large number of key research groups which attend ICML, decisions and agreements taken at the workshop will have the potential to significantly impact the future of machine learning software.
Workshop Program:
The 1 day workshop will be a mixture of talks (including a mandatory demo of the software) and panel/open/hands-on discussions.
Morning session: 9:00am - 12:30am
- 09:00 Welcome and Overview
- 09:05 Invited Talk: OpenCV (Gary Bradski)
- Spotlight Talks
- 09:50 Jstacs
- 09:55 Scikitlearn
- 10:00 Mulan Mulan
- 10:05 JBlas
- Contributed Talk
- 10:35 - 11:20 Poster Session and Coffee Break
- Contributed Talks
- 11:20 Shogun
- 11:45 The next steps after UCI - mldata.org
- 11:50 Discussion: Exchanging Software and Data
- 12:30 - 14:00 Lunch
Afternoon session: 14:00 - 18:00pm
- Contributed Talks
- Spotlight Talks
- 15:15 OpenKernel
- 15:20 MultiBoost
- 15:25 Gidoc
- 15:30 Dependency Modelling Toolbox
- 15:35 - 16:25 Poster Session and Coffee Break
- 16:25 - Invited Talk: (Victoria Stodden)
Reproducible Research in Computational Science: Problems and Solutions For Data and Code Sharing
Scientific computation is emerging as absolutely central to the scientific method, but the prevalence of very relaxed practices is leading to a credibility crisis. Reproducible computational research, in which all details of computations—code and data—are made conveniently available to others, is a necessary response to this crisis. Results from a 2009 survey of the Machine Learning community (NIPS participants) designed to elucidate factors that affect data and code sharing will be presented. Intellectual property concerns create a significant barrier to sharing, and I will also present work on the “Reproducible Research Standard” giving open licensing options designed to create an intellectual property framework for scientists consonant with longstanding scientific norms and facilitating reproducible research.
- 17:10 - Discussion: Reproducible research
Invited Speakers
- Gary Bradski
Gary Bradski was previously responsible for the Open Source Computer Vision Library (OpenCV) that is used globally in research, government and commercial applications. He has also been responsible for the open source statistical Machine Learning Library and the Probabilistic Network Library. More recently Dr. Bradski led the vision team for Stanley, the Stanford robot that won the DARPA Grand Challenge autonomous race in 2005 and most recently helped found the Stanford Artificial Intelligence Robot (STAIR) project under the leadership of Professor Andrew Ng. Dr. Bradski recently published a new book for O'Reilly Press: Learning OpenCV: Computer Vision with the OpenCV Library.
- Victoria Stodden
Victoria is a Postdoctoral Associate in Law and a Kauffman Fellow in Law at the Information Society Project at Yale Law School. After completing her PhD in statistics at Stanford University in 2006 with advisor David Donoho, she obtained a Master in Legal Studies in 2007 from Stanford Law School. She is developing a new licensing structure for computational research and author of the award winning paper "Reproducible Research Standard" that describes her ideas.
Call for Contributions
The organizing committee is currently seeking abstracts for talks at MLOSS 2010. MLOSS is a great opportunity for you to tell the community about your use, development, or philosophy of open source software in machine learning. This includes (but is not limited to) numeric packages (as e.g. R,octave,numpy), machine learning toolboxes and implementations of ML-algorithms. The committee will select several submitted abstracts for 20-minute talks.
The submission process is very simple:- Tag your mloss.org project with the tag icml2010
- Ensure that you have a good description (limited to 500 words)
- Any bells and whistles can be put on your own project page, and of course provide this link on mloss.org
Note:Projects must adhere to a recognized Open Source License (cf. http://www.opensource.org/licenses/ ) and the source code must have been released at the time of submission. Submissions will be reviewed based on the status of the project at the time of the submission deadline.
Program Committee
All confirmed- Jason Weston (Google Research, NY, USA)
- Leon Bottou (NEC Princeton, USA)
- Tom Fawcett (Stanford Computational Learning Laboratory, USA)
- Sebastian Nowozin (Microsoft Research, UK)
- Konrad Rieck (Technische Universität Berlin, Germany)
- Lieven Vandenberghe (University of California LA, USA)
- Joachim Dahl (Aalborg University, Denmark)
- Torsten Hothorn (Ludwig Maximilians University, Munich, Germany)
- Asa Ben-Hur (Colorado State University, USA)
- Klaus-Robert Mueller (Fraunhofer Institute First, Germany)
- Geoff Holmes (University of Waikato, New Zealand)
- Peter Reutemann (University of Waikato, New Zealand)
- Markus Weimer (Yahoo Research, California, USA)
- Alain Rakotomamonjy (University of Rouen, France)
Organizers:
- Soeren Sonnenburg, Mikio Braun
Technische Universität Berlin, Franklinstr. 28/29, FR 6-9, 10587 Berlin, Germany
- Cheng Soon Ong
ETH Zürich, Universitätstr. 6, 8092 Zürich, Switzerland
- Patrik Hoyer
Helsinki Institute for Information Technology, Gustaf Hällströmin katu 2b, 00560 Helsinki, Finland