Open Thoughts

Interoperability and the Curse of Polyglotism

Posted by Mikio Braun on August 12, 2008

It seems that this homepage is steadily growing. We already have a large number of registered projects covering many different applications and machine learning methods. Time to think where we're heading with all of this.

I think one of the first goals of this whole endeavor is that you can easily find software to methods published elsewhere. Irrespective of whether you're interested in comparing your own method against some method, or if you actually want to apply the method to some real data, being able to find and download the software is a huge improvement with respect to having to re-implement the method based on the paper.

However, I think that ultimately it would be great if some form of interoperability between different software packages which address the same problem would evolve. In particular in a field as machine learning where the number of (abstract) problems is relatively slow, and there exist many competing methods for a given problem (like, for example, two-class classification on vectorial data), and being able to replace one of these methods easily with another one would be very useful.

The way to achieve this is, as everywhere else in the industry, to develop standards. Actually, there are many different level where such standards could be defined, ranging from web-services, over binary APIs to data file formats.

A few week's ago, I advocated the use of modern scripting languages like python or ruby to develop new machine learning toolboxes, but actually with respect to interoperability, this "polyglotism" puts up some new problems. Back in the "old days" when people where mostly using compiled languages, making your software usable for others was a matter of creating a library which could then be linked against new programs. Differences in calling conventions aside, this approach was relatively flexible, for example, you could use a Fortran library in C or a C library in C++.

But if you use a scripting language like python, you can use that library only in python. You cannot like your C file against the python module, or import the module in another language like ruby. If you want to re-use some library in python in another language, you have to invest in some more infrastructure.

The hard way would be to set up a language-agnostic interface to your python code, for example by creating a web-service, or use some form of protocol like CORBA.

The low cost version would be to settle on a common data format. Then, you can in principle combine methods from different environments by storing intermediate results in files. It won't be fast, but it will work.

To support his approach, we have started a discussion some time ago, where we have settled on the ARFF format as a possible starting point. Furthermore, we have started to write and/or compile code for reading and writing ARFF files for a large number of programming languages, such that you do not have to write the file format yourself.


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.