Workshop on Machine Learning Open Source Software 2015: Open Ecosystems

The ICML Workshop on Machine Learning Open Source Software (MLOSS) will held in Lille, France on the 10th of July, 2015.

Machine learning open source software (MLOSS) is one of the cornerstones of open science and reproducible research. Along with open access and open data, it enables free reuse and extension of current developments in machine learning. The mloss.org site exists to support a community creating a comprehensive open source machine learning environment, mainly by promoting new software implementations. This workshop aims to enhance the environment by fostering collaboration with the goal of creating tools that work with one another. Far from requiring integration into a single package, we believe that this kind of interoperability can also be achieved in a collaborative manner, which is especially suited to open source software development practices.

The workshop is aimed at all machine learning researchers who wish to have their algorithms and implementations included as a part of the greater open source machine learning environment. Continuing the tradition of well received workshops on MLOSS at NIPS 2006, NIPS 2008, ICML 2010 and NIPS 2013, we plan to have a workshop that is a mix of invited speakers, contributed talks and discussion/activity sessions. For 2015, we focus on building open ecosystems. Our invited speakers will illustrate the process for Python and Julia through presenting modern high-level high-performance computation engines, and we encourage submissions that showcase the benefits of multiple tools in the same ecosystem. All software presentations are required to include a live demonstration. The workshop will also include an active session (“hackathon”) for planning and starting to develop infrastructure for measuring software impact.

Programme

08:30-08:40	Opening remarks
08:40-09:30	(invited) Matthew Rocklin: Extending the Numeric Python ecosystem beyond in-memory computing
09:30-09:45	Collaborative filtering via matrix decomposition in mlpack
09:45-10:00	BLOG: a probabilistic programming language for open-universe contingent Bayesian networks
10:00-10:30	COFFEE BREAK
10:30-12:00	Spotlights (3 minutes each): Nilearn - Machine-learning for neuroimaging in Python KeLP: a Kernel-based Learning Platform in Java ~~MIPS: A graph mining library~~ (cancelled) DiffSharp: Automatic Differentiation Library The FAST toolkit for Unsupervised Learning of HMMs with Features OpenML: a Networked Science Platform for Machine Learning Followed by a demo/poster session
12:00-14:00	LUNCH
14:00-14:50	(invited) John Myles White: Julia's Approach to Open Source Machine Learning
14:50-15:05	Caffe: Community Architecture for Fast Feature Embedding
15:05-16:00	Unconference discussion
16:00-16:30	COFFEE BREAK
16:30-17:00	Gaël Varoquaux: From flop to success in academic software development
17:00-18:00	Hackathon: Altmetrics
17:55-18:00	Closing

Details about invited speakers

John Myles White: Julia's Approach to Open Source Machine Learning

In this talk, I'll describe the Julia community's approach to developing open source machine learning libraries. I'll start by introducing Julia as a language. In particular, I'll explain how Julia's type inference system makes the language different from other high-level languages, such as Matlab, Python or R. I'll then describe how the design of Julia as a language influences the design of libraries for Julia. To do this, I'll focus on two extended examples: the design of the Distributions.jl package and the design of the recently introduced Nullable type.

I'll then shift focus and describe some of the social dynamics that have influenced Julia's development, including issues involving the management of labor, the selection of tooling, the establishment of social conventions and the refinement of consensus-building strategies. Some of these issues are broadly relevant to all open source projects, whereas others are uniquely challenging for open source machine learning projects. I'll close by describing some of the areas in which Julia is still behind more established languages.

Bio: John Myles White is a member of the Core Data Science team at Facebook, where he works on Facebook's internal statistical tooling. Outside of his work at Facebook, John has worked on Julia's machine learning libraries since the language was first publicly released in 2012. Prior to coming to Facebook, John was a graduate student in Princeton's psychology department.
Matthew Rocklin: Extending the Numeric Python ecosystem beyond in-memory computing

The open source Scientific Python ecosystem contains mature, efficient, and sophisticated solutions to modern problems. A wide variety of software projects coordinate around a small set of core libraries (numpy, pandas). This coordination enables the ecosystem to grow and serve the needs of the broader research community.

However those core libraries have limitations. They were designed for single process, in-memory computation on highly structured data. As the community extends beyond traditional scientific users and dataset extend beyond memory boundaries, the Scientific Python ecosystem exposes its shortcomings.

This talk discusses those shortcomings and possible solutions for them. It chronicles the development of libraries for parallel computing that attempt to raise Python's existing computational infrastructure into parallel and streaming workflows.

Bio: Matthew is a computational scientist at Continuum Analytics and full time open source developer within the scientific python ecosystem. He holds a PhD in Computer Science from the University of Chicago and undergraduate degrees in Mathematics and Physics from the University of California at Berkeley.

His research crosses numerical linear algebra, computer algebra, and distributed systems. He builds libraries for out-of-core and distributed computing that target non-expert users.

Gaël Varoquaux: From flop to success in academic software development

A practical, opinionated, guide to open source scientific software

This talk will be a mix of provocative thoughts on software in academia and practical open-source development advice. Most lines of code written by programmers in academia only reach a very small audience. I believe that the root of the problem is not that academic programmers are incompetent, but more an overall difficulty of balancing research goals with software goal, as well as managing the inherent complexity of a growing project.

I will summarize what I have learned leading several scientific software projects to wide success, such as Mayavi, joblib, scikit-learn: What are the choices that make a successful project? What do the different licenses mean? How to ensure quality? How to organize a community of productive developers that move together?

Bio: Gaël Varoquaux is a computer-science researcher at INRIA. His research develops statistical learning tools for functional neuroimaging data with application to cognitive mapping of the brain as well as the study of brain pathologies. In addition, he is heavily invested in software development for science, and wants to make leading-edge algorithmic and statistical tools developed in computer science available across new fields. He project-lead for scikit-learn, one of the reference machine-learning toolboxes, and on joblib, Mayavi, and nilearn. Varoquaux is a nominated member of the Python Software Fundation, and the director of the LearnClues laboratory. He has a PhD in quantum physics and is a graduate from Ecole Normale Superieure, Paris.

Call for Contributions (Deadline passed)

Important Dates

Submission Date: ~~28 April 2015, 23:59 UTC~~
Notification of Acceptance: ~~11 May 2015~~
Workshop date: 10 July 2015

Note that the submission deadline is a few days earlier than the ICML recommended deadline. This is to give our program committee a reasonable amount of time to review your submission.

The organizing committee is currently seeking abstracts for talks at MLOSS 2015. MLOSS is a great opportunity for you to tell the community about your use, development, philosophy, or other activities related to open source software in machine learning. The committee will select several submitted abstracts for 20-minute talks.

All submissions must be made to https://www.easychair.org/conferences/?conf=mloss2015

Submission types

1. Software packages

Similar to the MLOSS track at JMLR, this includes (but is not limited to) numeric packages (as e.g. R, Octave, Python), machine learning toolboxes and implementations of ML-algorithms.

Submission format: 1 page abstract which must contain a link to the project description on mloss.org. Any bells and whistles can be put on your own project page, and of course provide this link on mloss.org.

Note:Projects must adhere to a recognized Open Source License (cf. http://www.opensource.org/licenses/ ) and the source code must have been released at the time of submission. Submissions will be reviewed based on the status of the project at the time of the submission deadline. If accepted, the presentation must include a software demo.

2. ML related projects

As the theme for this year is open ecosystems, projects of a more general nature such as software infrastructure or tools for general data analysis are encouraged. This category is open for position papers, interesting projects and ideas that may not be new software themselves, but link to machine learning and open source software.

Submission format: abstract with no page limit. Please note that there will be no proceedings, i.e. the abstracts will not be published.

We look forward for submissions that are novel, exciting and that appeal to the wider community.

Program Committee

Asa Ben-Hur (Colorado State University)
Mathieu Blondel (NTT Communication Science Laboratories)
Mikio Braun (Technical University of Berlin)
Ryan Curtin (Georgia Tech)
Alexandre Gramfort (Telecom ParisTech)
Ian Goodfellow (Google)
James Hensman (University of Sheffield)
Laurens van der Maaten (Facebook AI Research)
Andreas Müller (New York University)
Mark Reid (Australian National University)
Peter Reutemann (University of Waikato)
Konrad Rieck (University of Göttingen)
Conrad Sanderson (NICTA)
Heiko Strathmann (University College London)
Ameet Talwalkar (University of California LA)
Lieven Vandenberghe (University of California LA)
Aki Vehtari (Aalto University)
Markus Weimer (Microsoft Research)

Sponsors:

Paris-Saclay Center for Data Science
Facebook

For sponsoring John Myles White
Continuum Analytics

For sponsoring Matthew Rocklin

Organizers:

Gaël Varoquaux

INRIA, France
Antti Honkela

University of Helsinki, Helsinki Institute for Information Technology HIIT, Helsinki, Finland
Cheng Soon Ong

Machine Learning Research Group, NICTA, Canberra, Australia