Open Thoughts

Documentation is hard to do

Posted by Cheng Soon Ong on December 4, 2009

There was an article at TechNewsWorld yesterday about the poor state of documentation in Linux. It seems that for most projects, there are two kinds of people: the users and the developers. Users always complain that the documentation is not good enough, and developers don't see the point of writing it. Funnily, once some tech savvy user starts digging around in the code a bit, he/she one day wakes up and finds that they have crossed the fence, i.e. the project which they initially said was badly documented is now what they are actively contributing to. Even worse, they also often don't write documentation themselves.

The pragmatic programmer gives two tips about documentation:

  • Treat English as just another programming language
  • Build documentation in, don't bolt it on

Then it goes on to distinguish between internal and external documentation. I think that for machine learning, the external part is really important. Very often, the users of machine learning software are not experts in the field, and "just" downloaded the code to see whether they can solve their problem. In fact, very often, the user is not even familiar with the programming language that the project is implemented in. Each language has its own idiosyncrasies, and projects should try to have at least a README file that tells the user how to get things working. Some basic things like how to compile, and specific command line operations to get the paths correct, etc. Even interpreted languages can be tricky. For example, matlab often requires the right set of addpath statements to get things working, python requires that $PYTHONPATH be set correctly.

It happens quite often that reviewers of JMLR submissions complain of not being able to "get the code working". Sometimes this is due to a deeper problem, but often it is just because the reviewer is not a user of the programming language of the submission. Now, before you criticize me and ask why I don't choose better reviewers; if you take the intersection of machine learning expertise, programming language and operating system, you often end up with only one group of people, namely the ones that submitted the project.


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.