Posted by: mbuckley56 | September 28, 2008

Netflix

The other day I was discussing the possibility of a thesis with Professor Brenner. For those of you in the class who are not Applied Math concentrators (I believe most are), Professor Brenner is the head of the Applied Math concentration. As expected, he strongly encouraged me to write a thesis, and he suggested a topic of which I had never heard of: the Netflix Prize. 

In case you don’t know, Net Flix (http://en.wikipedia.org/wiki/Netflixis a DVD rental service which mails videos to your house and operates principally through the Internet. You pay a flat rate every month and you get to keep DVDs for as long as you want.

After each time a customer watches a video, Netflix asks you to rate the video one through five.  Based on the customers ratings, a list of recommended titles is provided.  In order to generate this list of recommended titles, Netflix naturally uses some sort of algorithm to predict movies which the customer will like. 

The Netflix Prize(http://en.wikipedia.org/wiki/Netflix_prize) is essentially to write a better algorithm than the one Netflix currently uses. If you beat the Netflix algorithm by 10%, you win $1,000,000.  Up until this is done, the best algorithm each year wins $50,000.  

In order to test your algorithm, you are given historical data about customers. You have to predict the ratings customers will give films, based on what they gave films in the past. Apparently, the current Netflix algorithm does not use any information about directors or movie genre (I think), which I thought was very strange.

The metric they use for judging the algorithm is the Root Mean Square Deviation (http://en.wikipedia.org/wiki/Root_mean_square_deviation).

Basically, I think this is a pretty cool idea for a thesis topic, and might actually do it. I really have no idea where to start, so if you all want to check it out, and possibly throw me some suggestions, that would be excellent. According to Professor Brenner, a bunch of Princeton students were on the leader-board of the competition for a while. He also said that if someone could get on the leader-board, they would immediately get job offers from hedge funds. 

Any ideas?

 

Michael

Advertisements

Responses

  1. Cool! Anyone know Pandora internet radio? The idea is the same: you rate songs and it tries to pick other songs you will like. It does ok. But they use a ton of data about the songs. In fact, they have professional musicians listen and rate the songs on hundreds of categories from “throaty singing” to “lots of drums” and stuff like that. I find it gives me things that are in the same genre as my other tastes, but isn’t good at predicting what I actually like.

    Amazon also has a system for recommendations based on what you’ve purchased. Does anyone know how this one works?

    Pandora’s system always seemed silly to me: a lot of work to evaluate every song by hand, while I’m convinced pure mathematics can do better without any knowledge of things like genre. Now you’ve got me wondering and thinking about it…

    Here’s the leader board which I looked up out of curiosity.

  2. Amazon’s is pretty basic, right? It just tells you what other items customers have purchased who have purchased the item you are looking at. Straight forward and usually generates good stuff for me, and I use the site a lot.

    Regarding using a lot of data: apparently the winner in like 2006 or 2007 or something used a lot of data mining off of IMDB and they claimed the opposite- that more data trumped extensive algorithms. Then again, it still wasn’t enough to beat the algorithm that Netflix is currently using.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: