What is an "easy to build" system?
Posted by Cheng Soon Ong on April 23, 2009
One thing that reviewers of submissions to the open source track of JMLR often complain about is that the submitted software doesn't build. At this discovery, some reviewers refuse to look at the rest of the submission. I agree that being able to compile a piece of code is quite an important part of the total score but it should not be the whole story. In fact, the review criteria for JMLR (OSS track) specifically lists other important criteria. Being easy to compile would fall under "good user documentation", since it would be the end user who would benefit from an easy to build system. But in general, once a reviewer is unable to build the submission, he would provide a negatively biased review. Even worse, he may not even consider other parts of the software project.
So, why do reviewers have so much trouble compiling software? The answer is quite complicated, and I would like to try to scratch the surface of this highly charged issue. More in depth recommendations for open source projects can be found for example in Karl Fogel's online book or Eric Steven Raymond's detailed howto. I restrict this post to Linux style "download, unzip and build" type software, ignoring GUI type "double click" installations, such as .dmg packages or .exe installers.
Documentation, documentation, documentation
A number of the compilation issues would be solved if there was clear and precise documentation, and the user reads this documentation. One JMLR submission had two reviewers who could not build the system but a third who commented on how smoothly everything went. It turned out that the author had written in his cover letter that the submitted code was not complete due to file size restrictions on the jmlr website, and reviewers are supposed to get the complete code online.
Apart from documentation for the user to understand what the project does and documentation for the developer on how to extend the project, there are the installation instructions. This includes stuff like how to install, how to upgrade from previous versions, and what dependencies are required. For Linux there are some conventions about how to structure things. If possible, one should stick to one of the standard idioms for compiling software (see the next section). As an aside, Google recently released their software update system.
The build system
The traditional build pipeline is the "configure; make;" system which is popular among C projects such as GNU projects. For python projects there is the setup.py idiom or easy_install. I am not a Java expert, but there seems to be a large plethora of build tools available. At the top of my ease of installation list comes the R community which has agreed on a single distribution channel. There seems to be a few up and coming build systems such as cmake, scons, waf and jam. If one uses too exotic a build system, the reviewers probably won't have it on their box and would have to first obtain the build system. However, often one would like the nicer features provided by the newer systems. Further, often JMLR reviewers are not experts in the language that the project is written in and are not familiar with the standard idioms (but this can be fixed by good documentation). It is a tough call...
One thing I've found quite nice is when projects have instructions on how to check that your build has completed successfully. For machine learning software, this can be a small example on toy data which allows that user to confirm that things are working as they should be.
Dependencies
Dependencies are a double edged sword. On one hand, one would like to take advantage of the efficiency of having highly optimized libraries such as blas, lapack, boost or GNU scientific library. But this often means that you may have to track changes in the dependencies or the user may not have dependencies available. We had one JMLR submission which used a combination of python and C++. One reviewer had a terrible time trying to get it working since first he was not familiar with python dependencies and second because his linux distribution provides python headers in a different package (and he didn't know).
Conclusion
There are all sorts of strange things that can happen while a user is trying to install your software. One should try to follow one of the common idioms for your language such that the user feels comfortable with the build. But at the end of the day, nothing beats real life testing. So, list your software on mloss.org before you try to submit to JMLR. It may just allow you to catch some installation bugs before they upset your reviewers.
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.