Open Thoughts

December 2009 archive

MLOSS ICML 2010 workshop?

December 16, 2009

We are thinking of organizing an ICML 2010 workshop on machine learning open source software. Does anyone here think this is a great idea like we do? If you would see this happen, please contact us and help us organize it.


US open access policy

December 14, 2009

The Office of Science and Technology Policy of the United States of America is having a public consultation on Public Access Policy, which will run till 7 January 2010. The first part (10-20 December 2009) considers implementation issues, in particular:

  • Who should enact public access policies?
  • How should a public access policy be designed?

The next two sections are (details here):

  • Features and Technology (Dec. 21 to Dec 31)
  • Management (Jan. 1 to Jan. 7)

If you care about how your research is being published, head over and give your views.

Documentation is hard to do

December 4, 2009

There was an article at TechNewsWorld yesterday about the poor state of documentation in Linux. It seems that for most projects, there are two kinds of people: the users and the developers. Users always complain that the documentation is not good enough, and developers don't see the point of writing it. Funnily, once some tech savvy user starts digging around in the code a bit, he/she one day wakes up and finds that they have crossed the fence, i.e. the project which they initially said was badly documented is now what they are actively contributing to. Even worse, they also often don't write documentation themselves.

The pragmatic programmer gives two tips about documentation:

  • Treat English as just another programming language
  • Build documentation in, don't bolt it on

Then it goes on to distinguish between internal and external documentation. I think that for machine learning, the external part is really important. Very often, the users of machine learning software are not experts in the field, and "just" downloaded the code to see whether they can solve their problem. In fact, very often, the user is not even familiar with the programming language that the project is implemented in. Each language has its own idiosyncrasies, and projects should try to have at least a README file that tells the user how to get things working. Some basic things like how to compile, and specific command line operations to get the paths correct, etc. Even interpreted languages can be tricky. For example, matlab often requires the right set of addpath statements to get things working, python requires that $PYTHONPATH be set correctly.

It happens quite often that reviewers of JMLR submissions complain of not being able to "get the code working". Sometimes this is due to a deeper problem, but often it is just because the reviewer is not a user of the programming language of the submission. Now, before you criticize me and ask why I don't choose better reviewers; if you take the intersection of machine learning expertise, programming language and operating system, you often end up with only one group of people, namely the ones that submitted the project.