Uncategorized

Netflix: Open Collaboration is Recommended

Judging by the leaderboard the Netflix grand prize is in sights for a select-few researchers including Pragmatic Theory. The target RSME is 0.8563 and the best entry as of the time this post is being written is 0.8582.

If you are at all interested in machine learning, AI or operations research, you would of heard about the Netflix competition that's been ongoing for over 3 years. If not, be advised that online movie-rental company, Netflix, have been running an open competition with 1 million USD up for grabs for anyone who can invent a collaborative filtering algorithm for movie ratings that beats their in-house algorithm (Cinematch) by 10% by 2011.

Because it's in their best interest to assist researchers, Netflix has a training data set of 100 million recommendations stripped of all PII. At the time the competition started way back in 2006, they actually had 103 million recommendations so the 3 million they didn't include in the training set are what is being used to evaluate any submitted recommendation systems. In the world of machine learning the golden rule is - the more data you have the better!

Netflix will clearly benefit from such an improvement when one is found - which looks to be real soon! - since it directly translates to increased movie rental revenue and lower subscription cancellation rates. Since they charge a fixed monthly subscription fee they know that users who don't rent enough movies per period will realise that they are not getting value for money and will cancel their subscription. Hence the goal for Netflix is to be able to figure out what the customer likes and have a ready supply of recommendations so they user never runs out of movies they want to see. However we need to be cognisant of the fact that in the online world, where inventory holding costs are negligible due to digital storage movie-retailers like Netflix can carry a significantly greater inventory of movies than traditional bricks-and-mortal rental outlets. Thus the poor user is faced with the paradox of choice making good recommendations even more important to their business model.

It's been an incredibly shrewd move by Netflix as it is a cost effective way for them to harness the resources of the (interested) research community for a fixed budget with a fixed timeframe. So in effect all 3 pillars of the infamous project-triangle are fixed! That's a project manager's dream. If they tried to do the R&D in-house they'd be unlikely to attract a team of individuals that can outperform the "open community", and in all likelihood it would take them longer and cost them more than 1 million USD to advance the science to the level they want to. In essence, the are employing the wisdom of crowds to good effect.

What's interesting is the type of folks who have done well in the competition, and the degree of collaboration between participants. As expected there is a decent smattering of professional research labs and academic mathematics departments near the top of the leaderboard, but there are also lone researchers and participants which aren't recognized experts in the field. One classic example is Gavin Potter, going under the guise of "Just a guy in a garage" got massive exposure from this article in Wired magazine for applying more non-mathematical notions in his approach which did fairly well for a while. An even with 1 million dollars up from grabs many of the teams entered openly share their approaches and experiences with others. If ever you wanted an example of how collaboration and the "wisdom of crowds" can advance our knowledge than, other than Wikipedia, this is it. You have to think that for the same reasons, open source software must, if it hasn't already, eventually overtake proprietary software systems if enough people contribute.

Uncategorized

JAOO 2009


I recently took time out to attend the JAOO conference in Brisbane. I'm not one to troop around to every conference that hits town because, frankly, the quality of the speakers is usually not that great, and the subject matter is usually fairly narrow. I'm happy to report that JAOO, however, is different. Despite the title of the conference which implies the only subject matter covered is Java, over 3 days I attended sessions on numerous languages from both the LAMP and Microsoft stacks including Java, JavaScript, F#, Objective-C, IronPython and IronRuby. As well, the vast majority of speakers presenting are well-respected, subject matter experts including numerous PhDs. Why wouldn't you want to learn Java from Joshua Bloch, the Chief Java Architect at Google? I was also privileged to spend 3 hours with JavaScript-guru Douglas Crockford from Yahoo.com learning the good, bad and really ugly parts of JavaScript. What he doesn't know about JavaScript isn't worth knowing!

Apart from deep dives in JavaScript and F#, the most interesting aspects of the conference for me were the discussions on Distributed Databases/Scalable Systems from several Google employees, and the talks on machine learning - one using F#, a functional program from Microsoft, and one based on the open-source offerings: Hadoop and Mahout.

If you only do 1 conference a year consider JAOO. The quality of the speakers and the breath of exposure is commendable. I'll certainly be heading back for more next year.