Netflix: part 1
Posted by Cheng Soon Ong on August 10, 2009
As most of you may know, the Netflix prize came to an exciting conclusion recently. The official results are not out yet about which of the top two teams on the leaderboard, The Ensemble or BellKor's Pragmatic Chaos, will win the 1 million dollar prize. The leaderboard shows the results on a public test set, but the grand prize winner will be evaluated on a secret test set by Netflix.
Anyway, I emailed the teams to ask them whether they used any machine learning open source software in their prize winning efforts. In general, the feeling I get from the responses is that both teams rolled their own solutions. They were also understandably reluctant to share their methods since the official results are not out yet, and also the fact that Netflix in essence owns the IP.
Greg McAlpin from The Ensemble was kind enough to collect information from his team and provide me with the following summary of open source software that they used. Unfortunately, they also did not want to share their machine learning methods.
Our team decided that it would be best to wait until Netflix officially
announces the winner of the competition before we talk about how we used
any open source software that is related to machine learning.
We used plenty of open source tools though. Different members of the
team used:
JAMA/TNT, Mersenne Twister, Ruby, Perl, Python, R, Linux, gcc (and tool
chain), gsl, tcl, mysql, openmp, CLAPACK, BLAS, all of the CygWin GNU
software
Many members of our team first met on a Drupal website. And personally,
I could never have kept track of everything that was going on without
TiddlyWiki.
I know that this isn't really what you were asking for. Much of the
existing open source software that we were aware of was not able to
handle the size of the Netflix Prize data set. I don't think that
anyone got Weka or even Octave to work with the data. Some excellent
new open source frameworks were created by people competing for the
Netflix Prize. It was interesting to me that code.google.com became the
home for many open source projects (instead of sourceforge).
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.