Open Thoughts

Netflix: part 1

Posted by Cheng Soon Ong on August 10, 2009

As most of you may know, the Netflix prize came to an exciting conclusion recently. The official results are not out yet about which of the top two teams on the leaderboard, The Ensemble or BellKor's Pragmatic Chaos, will win the 1 million dollar prize. The leaderboard shows the results on a public test set, but the grand prize winner will be evaluated on a secret test set by Netflix.

Anyway, I emailed the teams to ask them whether they used any machine learning open source software in their prize winning efforts. In general, the feeling I get from the responses is that both teams rolled their own solutions. They were also understandably reluctant to share their methods since the official results are not out yet, and also the fact that Netflix in essence owns the IP.

Greg McAlpin from The Ensemble was kind enough to collect information from his team and provide me with the following summary of open source software that they used. Unfortunately, they also did not want to share their machine learning methods.

Our team decided that it would be best to wait until Netflix officially announces the winner of the competition before we talk about how we used any open source software that is related to machine learning.

We used plenty of open source tools though. Different members of the team used: JAMA/TNT, Mersenne Twister, Ruby, Perl, Python, R, Linux, gcc (and tool chain), gsl, tcl, mysql, openmp, CLAPACK, BLAS, all of the CygWin GNU software

Many members of our team first met on a Drupal website. And personally, I could never have kept track of everything that was going on without TiddlyWiki.

I know that this isn't really what you were asking for. Much of the existing open source software that we were aware of was not able to handle the size of the Netflix Prize data set. I don't think that anyone got Weka or even Octave to work with the data. Some excellent new open source frameworks were created by people competing for the Netflix Prize. It was interesting to me that code.google.com became the home for many open source projects (instead of sourceforge).

Comments

No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.