QMDB Recommendations - Part 2

Post by **Quicky** » Mon May 05, 2008 6:51 pm

Hey guys, I thought it was about time I started a new thread for this

.

For those of you who haven't (completely) read the previous thread, let me explain in short what this is all about.

A few years ago I started making a movie database that I used to check which movies I'd probably like and which not. At first I based my personal recommendations only on the movie's IMDB rating and the number of oscars it had won. Through the years I've added more information to base the recommendations on. A bit over two months ago I found Criticker and since then I've added a few extra parameters and I've automated the whole thing somewhat so that I now can generate recommendations not only for myself but for any Criticker user.

The parameters that I use in my calculations are the following:

Criticker PSI: this is the Probable Score Index that Criticker displays to you for each movie that you haven't seen. The higher the PSI, the more likely you'll like the movie.
IMDB: this is the average user rating for the movie that is shown on IMDB. At the moment I use the 18-29 year olds rating by default.
Keywords: my program analyses the movies that you've seen for their keywords, and then favours movies that have the same keywords as other movies that you liked.
Genre: just like with keywords, my program analyzes the movies that you've seen for their genres, and then favours movies that have the same genres as other movies that you liked.
Votes: maybe I should rename this to popularity. This parameter is based on the number of votes that each movie has on IMDB. The more popular a movie, the better.
Awards: this is the amount of awards a movie has won and has been nominated for. The more awards a movie has won, the better.
Length: this parameter is based on the running time of the movie. I noticed that the movies I like best tend to run longer than the movies I like worst. It seems I'm not the only person who favours longer movies, since every other user I've generated recommendations for so far prefers longer movies over shorter ones.
Age: this parameter shows how recent a movie is. It seems I'm pretty much the only one who prefers more recent movies. The other users I've made recommendations for prefer older movies. Nevertheless I'm wondering lately how useful the Age parameter really is. Anyway...
Popcorn: it took a really long time for me to find a way to quantify this parameter, but I'm quite confident that it is pretty accurate right now. This parameter tries to distinguish popcorn movies from non-popcorn movies. It does this by combining data on box office results, IMDB ratings and genres. Popcorn movies are light, accessible, fun movies that you go to in order to enjoy your friends company, more than that you want to be in awe by the movie. For example, three movies on the popcorn end of the scale are Meet the Spartans, Anaconda, and Date Movie. Three movies on the non-popcorn end of the scale are Dead Man, The Celebration, and Irréversible.

Here's an example what my system comes up with:

If you want me to make one of these for you, message me here on Criticker with your Criticker password (after you changed it for safety!) and I'll get you a list like the one above in about a day or so. Unfortunately I need your password in order to get hold of the PSI's Criticker shows you.

And now for those who've been following this thread from the beginning...

I've made a few changes to the system since the last version. These are the following:

The Awards parameter is back in, as well as a popularity parameter, based on the number of votes a movie has on IMDB.
I've tweaked the popcorn parameter to also be based on the genre of a movie, and not just on its box office results. If two movies have made the same amount of money at the box office and have the same IMDB rating, then the one that has action and romance as genres will have a higher popcorn rating than the one that has history and documentary as genre.
I've found a completely new way to optimize the weights of the parameters. The primary gain is in computing time. Where my previous code took 20 minutes to get accurate weights, my new code takes only 1 second to generate the final scores (or over 1000 times faster ). For some more info on the math, see below. The performance gain varies from user to user. For me personally it works a bit less well, but for KGB and AFlickering, the final correlations were higher with the new method.

Here are updated lists for:

You'll notice that I removed the weights percentages from the .pdf. This is because with my new method, I have not found a way yet to quantify the weights of the original parameters. I could give you the weights of the mutually independent parameters, but these are very different from the original ones and thus not really relevant to show you what is important and what is not. I think the correlations already show you quite well which parameters work well for you and which not. You might also notice a significant change with your previous lists because of the new optimization method. If you find them (much) worse, please let me know why you feel that way, so that I can figure out whether I can do something about it.

* The math of my new optimization method: In my previous optimization algorithm, I tried to find the maximum of the final correlation in the N-dimensional space of weights, where N was the number of parameters. The reason I couldn't just use the correlations of each parameter on its own to determine the weights is because those parameters all have non-zero correlations with eachother, i.e. they are not mutually independent. So part of the information of the PSI parameter for example, is also present in the IMDB parameter. This is because for most people, movies that are high on IMDB will also be high on Criticker. With my new method, I converted these 9 mutually dependent parameters into 9 mutually independent parameters. Thus the parameters share no information anymore, and the correlations between them are thus zero. This now precisely means that a higher correlation with my ratings also means it needs a higher weight in my final score, and I don't have to worry anymore about mutual dependencies. I'm neither a mathematician, nor a computer programmer, so I can't really explain why the new method works better for some and worse for others, but the important thing is that generally it performs pretty much equally well, while being a thousand times faster.

Post by **KGB** » Mon May 05, 2008 10:54 pm

You're so awesome.
I've downloaded 'Monthy Python and the Holy Grail' to see if it's true that comedies don't get better than that, thanks.

Post by **Quicky** » Tue May 06, 2008 1:35 pm

Updated recommendation lists for:

Post by **td888** » Tue May 06, 2008 1:59 pm

Quicky wrote:Updated recommendation lists for:
td888

Thanks!

Post by **MmzHrrdb** » Tue May 06, 2008 3:22 pm

Ready to do mine yet?

Post by **Quicky** » Tue May 06, 2008 3:42 pm

FitFortDanga wrote:Ready to do mine yet?

As I said last time, if you give me a password I can make the list using the data I already have in my database. Depending on how many movies I'd have to add to my database, I can also consider adding some (maybe) or all (pretty unlikely) to improve the results.

[EDIT] I have 1225 out of the 2689 movies you've rated in my database. I'm lacking 1464 of them. Sorry, that's about 15 hours of boring work, so... nope I won't be able to add all. Nevertheless, 1225 movies should yield decent results for you, I"d wager.

Post by **paulofilmo** » Tue May 06, 2008 6:26 pm

Cheers Quicky. Although these don't look as closely matched. In the first pdf. you made, I noticed the sort of drama films that I had already saved in my IMDb 'My Movies'.. I don't know. These had the opposite response to that feeling of accuracy and personality match.

Quicky wrote:It seems I'm pretty much the only one who prefers more recent movies. The other users I've made recommendations for prefer older movies. Nevertheless I'm wondering lately how useful the Age parameter really is. Anyway...

I don't understand this. Most of my Tier 10 films are made between 1998 and 2002. My tier 9 films are more spread, but still; 1980+. And similar for Tier 8, which are the films I can watch again and again.

Post by **Quicky** » Tue May 06, 2008 7:13 pm

paulofilmo wrote:Cheers Quicky. Although these don't look as closely matched. In the first pdf. you made, I noticed the sort of drama films that I had already saved in my IMDb 'My Movies'.. I don't know. These had the opposite response to that feeling of accuracy and personality match.

Do you feel the same about the movies in the "Probably not so easy to find" section? Those look like drama films to me... I changed the code that decides in which section to put the movies in, so it might be that some of the films that caught your eye in the upper section before have now been shifted to the lower section. If you want I can make the same list again with the old algorithm so you can compare.

paulofilmo wrote:I don't understand this. Most of my Tier 10 films are made between 1998 and 2002. My tier 9 films are more spread, but still; 1980+. And similar for Tier 8, which are the films I can watch again and again.

The math doesn't lie

. You also have to watch at your lower tiers. If those primarily contain recent movies, it'll pull your taste towards older movies. Yet again, as somebody else pointed out... this might be biased because you probably only see the 'good' movies from the old days, while for recent movies you just watch what comes on TV so to speak.

Post by **Quicky** » Tue May 06, 2008 8:48 pm

New list for:

jeff_h

Post by **PeaceAnarchy** » Tue May 06, 2008 9:30 pm

I've been reading these threads with curiosity. Would you be willing to do mine?

QMDB Recommendations - Part 2

QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2