Open Thoughts

Posted by Soeren Sonnenburg on December 8, 2010

We have released a new community portal to collaboratively upload and define datasets, tasks, methods and challenges.

It is meant as the next steps after UCI enabling reproducible research. It is complementing and in contrast to UCI data sets can be uploaded/edited wiki-style in collaborative fashion. We support download and upload in various data formats like .matlab, .octave, .csv, .arff, .xml for your convenience. Naturally this website is web 2.0 supporting tagging, comments, email notifications, searching, browsing and a forum.

Going beyond a mere collection of datasets one can define tasks to be solved on a particular dataset including train/test split of the dataset, input and output variables and the performance measure.

One can then upload ones methods predictions and get server side evaluations and a ranking of the results based on the performance measure. Note that the site even renders receiver operator characteristic or precision recall curves.

Once you have defined a number of tasks you may group them together defining a challenge.

In contrast to other related portals, all of the public content is immediately available for download (without the need to register with the site). In addition, we supply mldata-utils that enables off-line processing of your data set, i.e. conversion from and to the standard hdf5 based format we defined, an api to download / upload content without accessing the website via a web-browser, and finally to evaluate the performance of your method.

So is the ideal platform

  1. for the data set creator who just ones to get researchers to work on their particular data set.
  2. for the machine learning benchmarking guy who developed a new fancy algorithm and is in search for a dataset/task that fits his needs.
  3. for the challenge organizer because provides all of the infrastructure to run challenges already.
  4. for the challenge participant that can conveniently download data and tasks in various formats.

Since all of is open source (including mldata-utils), we invite machine learning researchers to participate in the development. So if there is a feature missing let us know and we will try to incorporate it on the site.


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.