Open Thoughts

Nature Editorial about Open Science

Posted by Cheng Soon Ong on February 28, 2012

The case for open computer programs

Does open source software imply reproducible research?

There was a recent Nature editorial expounding the need for open source software in scientific endeavors. It argues that many modern scientific results depend on complex computations and hence source code is needed for scientific reproducibility. It is nice that a high profile journal has published articles promoting open source software, since it increases visibility. However, some more careful thought is required, as the message of the article is inaccurate in both directions.

Open source provides more benefits than just reproducibility

Actually, open source provides more than is necessary for reproducibility, since the licenses provides the ability to edit and extend the code, as well as preventing discriminatory practices. To be pedantic, for reproducibility, any software (even a compiled executable) would work.

We've said this before but the message is worth repeating. Open source provides:

  1. reproducibility of scientific results and fair comparison of algorithms;
  2. uncovering problems;
  3. building on existing resources (rather than re-implementing them);
  4. access to scientific tools without cease;
  5. combination of advances;
  6. faster adoption of methods in different disciplines and in industry; and
  7. collaborative emergence of standards.

See our paper for more details.

Having source code does not imply reproducibility

As the editorial observes in the final sentence of the abstract "The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, ... ". I've personally spent many frustrating hours trying to get somebody's research code compiled. In fact, one of the most common complaints by reviewers of JMLR's open source track is that they are unable to get submissions to work on their computer. The multitude of computing environments, numerical libraries and programming languages means that very often, the user of the software is in a different frame of mind compared to the authors of the source code. My advice to fledging authors of machine learning open source software is to provide a "quickstart" tutorial in the README, because everybody is impatient, and nobody will look into fixing your bugs before they are convinced that your code will do something useful for them. And yes, fixing $PATH can be tricky if you don't know exactly how to do it.

I guess the bottom line is quite an obvious statement: Good open source software will give you reproducibility and a few other additional benefits.

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.