Open Thoughts

May 2010 archive

Software patentable in Germany

May 21, 2010

It seems that the highest German appeals court in matters of civil and criminal law overruled the country's highest patent-specialized court, deciding to uphold a software patent. More analysis is available at the blog foss patents. The original German PDF of the ruling is also available as text.

From foss patents:

  • After a landmark court ruling, the German perspective on the validity of software patents is now closer than ever to that of the US.
  • Basically, Germany has now had its own Bilski case -- with the worst possible outcome for the opponents of software patents.
  • Recently, the Enlarged Board of Appeal of the European Patent Office upheld that approach to software patents as well, effectively accepting that a computer program stored on a medium must be patentable in principle.
  • Defense strategies such as the Defensive Patent License are needed now more than ever.

Defensive Patent License is being proposed by two professors from Berkeley.

Data management plan

May 18, 2010

Starting October, research grant applications to the NSF need to have a data management plan. If you deal with data, and haven't got a plan yet, here is one to follow.

ICML 2010 MLOSS Workshop Preliminary Program now available

May 13, 2010

The programme of the ICML 2010 Machine Learning Open Source Software workshop is now available. All contributors should have received a notification of acceptance email by now. We thank all of you for your submissions. This year we received 16 submissions of which 5 were selected for talks and 8 for short (5 minute) poster spotlight presentations. These 13 submissions will all be presented in the poster session. A detailed schedule of the workshop is available from the workshop website.

Note that we changed the format of the MLOSS workshop slightly (compared to the previous ones taking place at NIPS): We are now going to have extended poster sessions, with hopefully all authors presenting their work in a (short) talk and posters or even live demos.

Doing so we hope to see more interaction between projects and allowed us to accept more than just 8 papers for just talks.

Conflicting OSS goals

May 12, 2010

It occurred to me while reviewing that the goals of OSS contributors and users are quite varied. Often these goals are in conflict. For example, here are a few ways of classifying packages I've noticed:

  • library with APIs vs complete package (end-to-end). Some packages are libraries with comprehensive APIs and are meant to be used as components in larger systems (or at least they assume the larger system will handle IO, evaluation, sampling, statistics, etc.). Other packages accomodate reading from standard formats (eg CSV, ARFF) and handle evaluation and other aspects of experimentation.

  • packages that produce intelligible models (trees, rules, visualizations) vs packages that produce black-box models. Some experimenters want/demand to understand the model, and a black-box "bag of vectors" won't work no matter how good the predictions.

  • flexible, understandable code vs efficient code. Some packages are written to be clean and extensible, while others are written to be efficient and fast. (Of course, some packages are neither :-)

  • single system vs platform for many algorithms. While some researchers contribute single algorithm implementations, there is a clear trend toward large systems (Weka, Orange, scikit.learn, etc.) which are intended to be platforms for families or large collections of algorithms.

In turn, a lot of this depends on whether the user is a researcher who wants to experiment with algorithms or a practitioner who wants to solve a real problem. Packages written for one goal are often useless for another. A program designed for several thousand examples that just outputs final error rates won’t help a practitioner who wants to classify a hundred thousand cases; a package with an interactive interface is very cumbersome for someone who needs to report extensive cross-validation experiments.

It’s clear from the JMLR OSS Review Criteria (http://jmlr.csail.mit.edu/mloss/mloss-info.html) that JMLR hasn’t thought about the wide variety of software issues. So I suggest that the mloss.org organizers (and contributors) start to think of useful categories for their code that can help people understand and navigate this space.