Open Thoughts

October 2008 archive

Reviewing software

October 13, 2008

The review process for the current NIPS workshop mloss08 is now underway. There are a couple of interesting thoughts that I had while discussing this process with Soeren and Mikio, as well as some of the program committee. The two issues are:

  • Who should review a project?
  • What are the review criteria?

Reviewer Choice

Unlike standard machine learning projects, choosing a reviewer for a mloss project has to be comfortable with three different aspects of the system, namely:

  • The machine learning problem (e.g. Graphical models, kernel methods, or reinforcement learning)
  • The programming language, or at least the paradigm (e.g. object oriented programming)
  • The operating environment (which may be a particular species of make on a version of Linux)

There is also projects about a particular application area of machine learning, such as brain-computer interfaces which put an additional requirement on the understanding of the reviewer.

However, if one looks at the set of people who satisfy all those criteria for a particular project, one usually ends up with only a handful of potential researchers, most of which would have a conflict of interest with the submitted project. So, often I would choose a reviewer who is an expert in one of the three areas and hope that he or she would be able to figure out the rest. Is there a better solution?

Review Criteria

The JMLR review criteria are:

  1. The quality of the four page description.
  2. The novelty and breadth of the contribution.
  3. The clarity of design.
  4. The freedom of the code (lack of dependence on proprietary software).
  5. The breadth of platforms it can be used on (should include an open-source operating system).
  6. The quality of the user documentation (should enable new users to quickly apply the software to other problems, including a tutorial and several non-trivial examples of how the software can be used).
  7. The quality of the developer documentation (should enable easy modification and extension of the software, provide an API reference, provide unit testing routines).
  8. The quality of comparison to previous (if any) related implementations, w.r.t. run-time, memory requirements, features, to explain that significant progress has been made.

This year's workshop has the theme of interoperability and coorperation. Therefore it is also a review criteria. The important question is how to weight the different aspects? The answer is not at all clear. There is a basic level of adherence which is necessary for each of the criteria, above which is it difficult to trade off the different aspects quantitatively. For example does very good user documentation excuse very poor code design? Does being able to run on many different operating systems excuse very poor run time memory and computational performance?

Put your comments below or come to this year's workshop and discuss this!

GNU Octave on Free Software Foundations High Priority List

October 6, 2008

The Free Software Foundation (FSF) maintains a high priority list of software projects and can be found here.

Quoting the FSF:

The FSF high-priority projects list serves to foster the development of projects that are important for increasing the adoption and use of free software and free software operating systems. [...] Some of the most important projects on our list are replacement projects. These projects are important because they address areas where users are continually being seduced into using non-free software by the lack of an adequate free replacement.

With rank eight among the top ten prioritized software projects is GNU Octave --- a free software Matlab replacement.

As this is very relevant to our community that is strongly dominated by Matlab, I would like to encourage everyone to try out octave 3.0. If you tried octave 2.x or any earlier version at some point, it really matured a lot. It supports all the data types like cell arrays, dense or sparse arrays you know from matlab and yes it has all these plotting functions like plot, surf3d etc too. And if you ever tried to extend matlab using C code, support is really much better from the octave side not to mention the killer feature: Octave is fully supported by swig! Still not convinced? We will have John W. Eaton to introduce octave to us at the NIPS'08 MLOSS Workshop. So what are you waiting for, give octave a try and see how you can help!

Differences between paid and volunteer FOSS contributors

October 3, 2008

I just stumbled across a very interesting article titled Differences between paid and volunteer FOSS contributors that I am going to almost fully quote below. The original article was written by Martin Michlmayr and can be found here. Almost full quote follows:

There's a lot of debate these days about the impact of the increasing number of paid developers in FOSS communities that started as volunteer efforts and still have significant numbers of volunteers. Evangelia Berdou's PhD thesis "Managing the Bazaar: Commercialization and peripheral participation in mature, community-led Free/Open source software projects" contains a contains a wealth of information and insights about this topic.

Berdou conducted interviews with members of the GNOME and KDE projects. She found that paid developers are often identified with the core developer group which is responsible for key infrastructure and often make a large number of commits. Furthermore, she suggested that the groups may have different priorities: "whereas [paid] developers focus on technical excellence, peripheral contributors are more interested in access and practical use".

Based on these interviews, she formulated the following hypotheses which she subsequently analyzed in more detail:

  1. Paid developers are more likely to contribute to critical parts of the code base.
  2. Paid developers are more likely to maintain critical parts of the code base.
  3. Volunteer contributors are more likely to participate in aspects of the project that are geared towards the end-user.
  4. Programmers and peripheral contributors are not likely to participate equally in major community events.

Berdou found all hypotheses to be true for GNOME but only hypothesis two and four were confirmed for KDE.

In the case of GNOME, Berdou found that hired developers contribute to the most critical parts of the project, that they maintained most modules in core areas and that they maintained a larger number modules than volunteers. Two important differences were found in KDE: paid developers attend more conferences and they maintain more modules.

Berdou's research contains a number of important insights:

  • Corporate contributions are important because paid developers contribute a lot of changes, and they maintain core modules and code.
  • While it's clear that the involvement of paid contributors is influenced by the strategy of their company, Berdou wonders whether another reason why they often contribute to core code is because they "develop their technical skills and their understanding of the code base to a greater extent than volunteers who usually contribute in their free time". It's therefore important that projects provide good documentation and other help so volunteers can get up to speed quickly.
  • Since many volunteers cannot afford to attend community events, projects should provide travel funds. This is something I see more and more: for example, Debian funds some developers to attend Debian conference and the Linux Foundation has a grant program to allow developers to attend events.
  • Paid developers often maintain modules they are not paid to directly contribute to. A reason for this is that they continue to maintain modules in their spare time when their company tells them to work on other parts of the code.

The rest of the article can be found here.