Open Thoughts

November 2008 archive

MLOSS progress updates for November 2008

November 27, 2008

Two months passed since the last statistics update, so lets see if we are progressing:

As of today mloss.org has

  • 158 software projects based on
  • 19 programming languages,
  • 302 authors (including software co-authors),
  • 284 registered users,
  • 63 comments (including spam :),
  • 109 forum posts,
  • 28 blog entries,
  • 51 software ratings,
  • 31525 software statistics objects,
  • 143 software subscriptions or bookmarks.

And happy birthday mloss.org - the site is live for 1 year and 1.5 months now and since we became a recent target of spammers it might show that mloss.org is not that unimportant anymore. This is also documented by a traffic growth from around 300 visits per week (February 2008) to more than 1000 per week (November 2008).

And congratulations Peter Gehler, author of the most successful software project: MPIKmeans (accessed more than 6000 times).

Finally JMLR-MLOSS received

  • 20 submissions until now,
  • 5 resubmissions;
  • 3 are already accepted and published,
  • 1 is pending publication

since its announcement in summer 2007.

One may conclude - there is visible progress. However, as already pointed out in several previous blog posts - we merely see several isolated mloss projects that don't at all inter-operate with each other. And it is clear that this trend needs to be stopped, but how could we support the next steps? In case you have some bright ideas either talk to us at NIPS*08 (possibly even attend the workshop and present your ideas in the discussion) or leave a comment...

Open Source is not Interoperability

November 26, 2008

Trying to prepare some thoughts about interoperability to be discussed at the NIPS workshop, I came across a bunch of websites roughly in the following order:

  • a rather negative article about the state of open source software and how they interoperate at eweek.com.

  • a very positive blog at ugotrade which talks about OpenSim and how it will be the next hot thing.

  • a post about why open source and interoperability are really two different things.

Quoting the third author:

  1. Interop is not open source.
  2. Interop does not require open source implementations
  3. Open source does not guarantee Interop

While one thinks that somehow it is natural for open source developers to make use of other bits of (open source) software, it usually doesn't happen. For me, interoperability can occur in two ways: the first being having a common set of protocols (as argued for by the third post above), and/or the second which is integrating another software library or method. In some sense, the "integration" idea also requires a set of protocols or APIs. It may be that I'm just being pedantic about trying to semantically differentiate between protocols and APIs. But the main idea remains: We need software that talks to other bits of software.

However, if both pieces of software are open source, we can do more than just have software that talks to other bits of software (which is why OpenSim is raising so much interest). In the process of having to push together two software projects, we may be able to come up with better interfaces between them. This is especially true in the research area (which in some sense practices carpentry) where it is not that clear from the start how programs should interact. For supervised machine learning, datasets are a good place to start. It seems "obvious" that this is one place where different machine learning algorithms can interface with each other. Even in this "simple" interface, there is a multitude of data formats and standards. Another quite fruitful area is in convex optimization, where there are several projects (even here on mloss.org) which easily link to different back ends, or several solvers which are used by various front ends. Interestingly, here the interfaces are actually dictated by the mathematics, and the software implementations are just mirroring these forms. I think it is within our reach to have these kinds of interoperability for many other areas of machine learning.

As for the long term goal of software systems being well integrated in the application specific fashion, I think we still have a way to go yet...

mloss08 Program

November 6, 2008

Just in case you haven't checked our workshop page recently, we have finalised our program. We had a surprisingly large number of submissions, ranging from quite mature projects to small radical ideas. In the end, we decided that we should try to squeeze in as many projects as possible, and at the same time try to keep some diversity in the program; i.e. we didn't want to have all slots taken up by large mature machine learning frameworks.

Our theme this year is "interoperability, interoperability, interoperability". The dream is to have some way for machine learning software to talk to each other. We are still a long way from being able to plug and play different tools for machine learning, and we hope to make a start by discussing this at the workshop. Of course, machine learning research is not only about software, but it is also about the data. Our afternoon discussion session will be about "UCI 2.0", and how we should go about it. There was a recent editorial in Nature Cell Biology about the need for standardizing bioinformatics data, and this blog post highlights three properties of scientific data.

Hope to see you at NIPS!