Open Thoughts

December 2010 archive

Scheduled Downtime December 17-19

December 14, 2010 will have a scheduled downtime on December 17-19 since the TU-Berlin is moving its data center. The new center will have significantly improved cooling/electricity facilities and network bandwidth.

Apologies for the inconvenience.

Call for Presentations for FOSDEM data devroom

December 14, 2010

Next year's FOSDEM meeting, a meeting which focuses on free and open source software, has a special meetup for data analysis and machine learning projects. The call for presentations ends on December 17, 2010 (this friday). The meeting will be held on February 5, 2011 in Brussels, Belgium.

New Journal: Open Research Computation

December 13, 2010

A new journal with a focus on software used in research has opened: Open Research Computation. Similar to the MLOSS Track at JMLR, the journal focuses on software submissions and sets high standards for code quality, and reusability.

The journal is also discussed in this blog post.

December 8, 2010

We have released a new community portal to collaboratively upload and define datasets, tasks, methods and challenges.

It is meant as the next steps after UCI enabling reproducible research. It is complementing and in contrast to UCI data sets can be uploaded/edited wiki-style in collaborative fashion. We support download and upload in various data formats like .matlab, .octave, .csv, .arff, .xml for your convenience. Naturally this website is web 2.0 supporting tagging, comments, email notifications, searching, browsing and a forum.

Going beyond a mere collection of datasets one can define tasks to be solved on a particular dataset including train/test split of the dataset, input and output variables and the performance measure.

One can then upload ones methods predictions and get server side evaluations and a ranking of the results based on the performance measure. Note that the site even renders receiver operator characteristic or precision recall curves.

Once you have defined a number of tasks you may group them together defining a challenge.

In contrast to other related portals, all of the public content is immediately available for download (without the need to register with the site). In addition, we supply mldata-utils that enables off-line processing of your data set, i.e. conversion from and to the standard hdf5 based format we defined, an api to download / upload content without accessing the website via a web-browser, and finally to evaluate the performance of your method.

So is the ideal platform

  1. for the data set creator who just ones to get researchers to work on their particular data set.
  2. for the machine learning benchmarking guy who developed a new fancy algorithm and is in search for a dataset/task that fits his needs.
  3. for the challenge organizer because provides all of the infrastructure to run challenges already.
  4. for the challenge participant that can conveniently download data and tasks in various formats.

Since all of is open source (including mldata-utils), we invite machine learning researchers to participate in the development. So if there is a feature missing let us know and we will try to incorporate it on the site.