QMDB Recommendations - Part 2

Post by **Quicky** » Tue May 06, 2008 9:39 pm

PeaceAnarchy wrote:I've been reading these threads with curiosity. Would you be willing to do mine?

Just send me a message here on Criticker with your password (change it first for safety!). I probably won't add all of your movies, since you've got quite a lot, but I'll see how many I can add without wasting too much time.

Post by **jeff_h** » Tue May 06, 2008 10:09 pm

Quicky wrote:New list for:
jeff_h

sweet, thanks!

Post by **llama** » Tue May 06, 2008 10:12 pm

Quicky wrote:Updated recommendation lists for:
Llamadeus

Thanks for this!

Post by **paulofilmo** » Wed May 07, 2008 3:34 am

Quicky wrote:
paulofilmo wrote:Cheers Quicky. Although these don't look as closely matched. In the first pdf. you made, I noticed the sort of drama films that I had already saved in my IMDb 'My Movies'.. I don't know. These had the opposite response to that feeling of accuracy and personality match.

Do you feel the same about the movies in the "Probably not so easy to find" section? Those look like drama films to me... I changed the code that decides in which section to put the movies in, so it might be that some of the films that caught your eye in the upper section before have now been shifted to the lower section. If you want I can make the same list again with the old algorithm so you can compare.

I was immediately impressed with the initial results because I saw that your recommendations predicted that I would consider modern films like "Three Colours: Blue" and "In The Mood For Love" as underrated. And then seeing Zerkalo (Arthouse) in the number one spot with 100% in the 'possibly not available for rent' section got me all excited at the thought of finding another film like my current favourite film (a '70s Arthouse flick).

I think it's because I'm more inclined to modern drama and vintage arthouse.

It's probably my irrational reactions, but I do find the first set of results more fitting. Not that I can really explain it. The new 'Probably easy to find' films don't really excite me much, or feel like me. Although it is great to see a Bergman film in there.

Are you able to explain specific results for films? Like why does "A Street Car Named Desire" have such a high score?

Let me know if I can help at all, Quicky. Or if you want me to find a better explanation for my reactions. I'm not trying to be a nuisance, just constructive.

Quicky wrote:
paulofilmo wrote:I don't understand this. Most of my Tier 10 films are made between 1998 and 2002. My tier 9 films are more spread, but still; 1980+. And similar for Tier 8, which are the films I can watch again and again.

The math doesn't lie . You also have to watch at your lower tiers. If those primarily contain recent movies, it'll pull your taste towards older movies. Yet again, as somebody else pointed out... this might be biased because you probably only see the 'good' movies from the old days, while for recent movies you just watch what comes on TV so to speak.

Yes, that sounds about right. I'm more tolerant of modern films to watch, but when I get round to a decent vintage flick, I tend to love them on a much higher level.

Post by **Quicky** » Fri May 09, 2008 2:34 pm

paulofilmo wrote:I was immediately impressed with the initial results because I saw that your recommendations predicted that I would consider modern films like "Three Colours: Blue" and "In The Mood For Love" as underrated. And then seeing Zerkalo (Arthouse) in the number one spot with 100% in the 'possibly not available for rent' section got me all excited at the thought of finding another film like my current favourite film (a '70s Arthouse flick). I think it's because I'm more inclined to modern drama and vintage arthouse.

I plan on making two lists for you, one with the old optimization algorithm and one exactly the same, except that it'll have used the new optimization algorithm. This should allow you to assess whether the problem lies with the new optimization algorithm or somewhere else (e.g. the new votes and awards parameter).

paulofilmo wrote:Are you able to explain specific results for films? Like why does "A Street Car Named Desire" have such a high score?

With the new algorithm this is more difficult, because the parameters it uses have been extracted from the original parameters and for now I cannot see the final parameters used for each movie. I can only see the original ones. From that I can deduce however that A Street Car Named Desire scores very well for you in the Keywords parameter. I can't say much more than that at the moment unfortunately.

paulofilmo wrote:Let me know if I can help at all, Quicky. Or if you want me to find a better explanation for my reactions. I'm not trying to be a nuisance, just constructive.

Of course, and please don't think you're being a nuisance. This is exactly why I asked you to allow me to make a list for you, i.e. so I could find things that don't work well.

paulofilmo wrote:Yes, that sounds about right. I'm more tolerant of modern films to watch, but when I get round to a decent vintage flick, I tend to love them on a much higher level.

Exactly. This is also why I'm considering getting rid of the Age parameter.

Post by **Quicky** » Sat May 10, 2008 12:39 am

Here's the new list for:

PeaceAnarchy

With this last list, I've changed the way the scores are calculated for each movie. The ranking has changed the same, but with this new way I've tried to make it such that if a movie has a score of 65.0%, you are most likely to rate it here on Criticker as 65. If you rate strangely, like AFlickering who only rates between 0 and 20, this will be reflected in the scores which will then run from 0% to 20%. Ok, maybe I should get rid of the % character, but for most people it should work fine. Pretty much like Criticker's PSI's, I've tried to adjust the scores to the way you rate.

I've also gotten rid of the "final fit correlation" value. Instead I've added a number that shows how accurate the scores supposedly are. For PeaceAnarchy this is +- 9%. This means that you will rate 7 out of 10 movies between the score that my system shows you minus 9% and plus 9%. So for PeaceAnarchy, if a movie has 63%, there's a 7 in 10 certainty that when he sees the movie, he will rate it somewhere between 54% and 72%. Well, that's the theory at least. I hope it also works in practice

.

I've also decided to make things a bit easier for me when making the list for new users with lots of movies. From now on I will only be adding new movies that are in your 9th and 10th tier, as well as all movies that have been rated by 50 people or more here on Criticker. For PeaceAnarchy this meant that there are 224 movies that I haven't added to my database out of the 1659 he (or she) has has rated on Criticker. Adding those would be of little value to new users since they're not very popular and they would not be that important in the taste optimization since they haven't been rated high.

One thing I'd like to ask of those who I've made lists for so far, is how you think (or if at all) I should split between movies that are easy to find and those who aren't. I originally made this distinction because when I printed out my top 50 list, my local video store would only have about 15 of those when I went there to rent a movie. So it was pretty useless to have all these old, obscure movies in my list when there was no chance my local video store would have them.

So...

Do you find it useful to split the list into movies that are readily available and movies that are harder to find?
And if so, do you find my current splitting algorithm puts too many movies in the readily available section? (i.e. should I move some of the movies in the easy-to-find section to the not-so-easy-to-find section?
or the other way around?

Post by **KGB** » Sat May 10, 2008 1:48 am

I do think it's good, but you have to filter it better. Lucky me, I have an amazing rental store just a few blocks away with any kind of film you could imagine, but I have started downloading films lately to get used to the idea of moving out and not having a decent rental store around, except for one who only has low-quality versions of the most recent shit.
So all in all, I think you should define 'Easy to find' pretty much as you would see the chances of finding a certein film in your average rental store: popcorn+age. Then again, there are certein classics, so I guess that if a film passes over, let's say, five important awards (that's sound more than enough for me) it is considered 'Easy to find'.
There will probably be some exceptions, but all in all, you should pretty much be covering it up.

Post by **PeaceAnarchy** » Sat May 10, 2008 1:54 am

Thanks. The list looks pretty good, except for the way it rates comedies, which seem consistently underrated. I realize this is probably because it's in one of my lowest genres, but as others have said about the age parameter, it's simply a function of watching more bad comedies than disliking comedies. The other two I've sort of seen Apocalypse Now( saw redux version) and Grindhouse (I've seen both parts but not together) look about right.

As for your questions, personally I don't find the distinction between easy to find and not easy to find useful. If I'm looking for specific stuff I'll find it on netflix or order it or something, and if I'm looking for something on a whim I'll go with my huge mental list of stuff I need to watch. I use recommendations as a more long term thing. For example, I haven't heard of Yol but I'll try to find it and maybe later this summer I'll get around to watching it, so availability isn't much of a concern, as long as it is out on DVD.

I do have some questions about your algorithm and the way you do this:
How do you calculate PSI information for movies we've already seen?
Does your program allow you to recalculate ignoring certain parameters? I'd be curious as to what the results would look like without some of the lower correlation variables having an effect.
When you create your bell shaped curve are you trying to fit a specific shape? A 70% confidence interval with a 9% margin of error seems pretty high, and I think it might be due to having so many low correlation parameters.
You mentioned a new method to make your parameters independent, could you go into that or send me link to what you've done?
Also, out of curiosity, would it be a lot of trouble to tell me which 224 films didn't make the cut?

Post by **Quicky** » Sat May 10, 2008 12:05 pm

KGB wrote:So all in all, I think you should define 'Easy to find' pretty much as you would see the chances of finding a certein film in your average rental store: popcorn+age. Then again, there are certein classics, so I guess that if a film passes over, let's say, five important awards (that's sound more than enough for me) it is considered 'Easy to find'.
There will probably be some exceptions, but all in all, you should pretty much be covering it up.

PeaceAnarchy wrote:Personally I don't find the distinction between easy to find and not easy to find useful. If I'm looking for specific stuff I'll find it on netflix or order it or something, and if I'm looking for something on a whim I'll go with my huge mental list of stuff I need to watch. I use recommendations as a more long term thing.

Hmm... Ok. Thanks for pointing that out

. I've added an extra page to the list now which has all recommendations without splitting it up in 'easy' and 'not easy to find'. I'll put the better splitting algorithm on my to-do list

. The problem is that I don't have anything to compare it with, and I don't want to waste the time to check which movies out of a list of 3,500 my local rent stores have.

The updated lists for:

PeaceAnarchy wrote:Thanks. The list looks pretty good, except for the way it rates comedies, which seem consistently underrated.

I'm assuming you come to this observation because the movies in the 'Humor' section all had low percentages, right? In the new list I've made for you, it pulled movies for the genres sections from all movies and not just from the ones in the 'easy to find' section. So you'll notice now that the movies in the 'Humor' section all have larger scores, since they don't have to be easy to find.

PeaceAnarchy wrote:How do you calculate PSI information for movies we've already seen?

Luckily these values are available in the source code of each movie web page on Criticker. They are not visible when you've already seen the movie, but they are there in the code. I hope Criticker will keep it that way, otherwise I'm in trouble

.

PeaceAnarchy wrote:Does your program allow you to recalculate ignoring certain parameters? I'd be curious as to what the results would look like without some of the lower correlation variables having an effect.

Not at the moment, but I certainly plan on making it easier to add or remove certain parameters to the mix, so that I can check for this kind of stuff.

PeaceAnarchy wrote:When you create your bell shaped curve are you trying to fit a specific shape? A 70% confidence interval with a 9% margin of error seems pretty high, and I think it might be due to having so many low correlation parameters.

The bell-shaped tier curve has no influence on how the movies are scored, so it is also of no relevance to the 9% accuracy. It is merely a way to give you a glimpse of the movies you've rated through every tier, where I take a bell-shaped tier system to emphasize my preference towards it in contrast to Criticker's 10% tier system :twisted:

.

As for the 9% accuracy, I think this is actually quite good. There's a $1,000,000 competition going on--organized by Netflix--to establish a recommendations system for Netflix with an accuracy of 0.8563 points. Netflix rates from 1 to 5, so translating this error margin to Criticker's 0 to 100 system, you find that they try to establish an accuracy of 21.4%. So my system is doing twice as good (for you at least). Now, of course you have to keep in mind that on Criticker we can rate all the way from 0 to 100 in steps of 1, whereas on Netflix you only have 1 (0%), 2 (25%), 3 (50%), 4 (75%) and 5 (100%), so this probably has an influence on the accuracy.

PeaceAnarchy wrote:You mentioned a new method to make your parameters independent, could you go into that or send me link to what you've done?

I'm no mathematician, so my method might not be a 100% sound, but this is what I've done:

Take parameter 1 (P1) and calculate the correlations between parameter 1 and parameter 2 to N (C1_2, ..., C1_N). Here a 'parameter' is an array that contains scores for each movie in this parameter, and N is equal to the number of parameters. This tells you how much information in parameter 1 is also in the other parameters.
Now subtract this mutual information from parameter 1, by doing P1_NEW = P1 - P2*C1_2 - P3*C1_3 - ... - PN*C1_N
Do the same for all parameters, so that you have N new parameters: P1_NEW, P2_NEW, ..., PN_NEW.
You will find that if you calculate the correlations again, they won't be that close to zero yet. So you repeat the procedure until the correlations between all the parameters are pretty much zero.

Now you have N new parameters that are mutually independent, so their importance in the final score can be accurately judged by their correlations with your ratings. At least that's my theory

. Let me know if there's a flaw in my reasoning

.

PeaceAnarchy wrote:Also, out of curiosity, would it be a lot of trouble to tell me which 224 films didn't make the cut?

Not at all: http://thequicky.net/files/search-movies.html

Please let me know if you find the new lists better and/or more useful and why. Thanks!

Post by **PeaceAnarchy** » Sun May 11, 2008 4:20 pm

Quicky wrote:I'm assuming you come to this observation because the movies in the 'Humor' section all had low percentages, right? In the new list I've made for you, it pulled movies for the genres sections from all movies and not just from the ones in the 'easy to find' section. So you'll notice now that the movies in the 'Humor' section all have larger scores, since they don't have to be easy to find.

That was part of it yeah, but it still seems those scores are lower than I'd expect, although I can't really know until I watch them, I may be underestimating how harsh I am on comedies. If the way you account for Genre is correct then this is probably just my perception.

Quicky wrote:The bell-shaped tier curve has no influence on how the movies are scored, so it is also of no relevance to the 9% accuracy. It is merely a way to give you a glimpse of the movies you've rated through every tier, where I take a bell-shaped tier system to emphasize my preference towards it in contrast to Criticker's 10% tier system .

Yeah, I should have realized this. I do like seeing the curve though.

Quicky wrote:As for the 9% accuracy, I think this is actually quite good. There's a $1,000,000 competition going on--organized by Netflix--to establish a recommendations system for Netflix with an accuracy of 0.8563 points. Netflix rates from 1 to 5, so translating this error margin to Criticker's 0 to 100 system, you find that they try to establish an accuracy of 21.4%. So my system is doing twice as good (for you at least). Now, of course you have to keep in mind that on Criticker we can rate all the way from 0 to 100 in steps of 1, whereas on Netflix you only have 1 (0%), 2 (25%), 3 (50%), 4 (75%) and 5 (100%), so this probably has an influence on the accuracy.

I didn't mean to say that your system was bad, just that it seemed strange that it would be so high. Upon thinking about the math behind it, it does make sense. It's just that intuitively +/-9% means a predicted score of 75 will be between 66 and 84 only 70% of the time which doesn't seem all that accurate. However, the score accuracy probably depends a lot on outliers, so the scores in the middle are actually more accurate that that and those on the ends are less accurate. Saying a predicted score of 95 will only be between 86 and 100 70% of the time or a predicted score of 30 will be only be between 21 and 39 70% of the time is a lot less strange.

Quicky wrote:I'm no mathematician, so my method might not be a 100% sound, but this is what I've done:
Take parameter 1 (P1) and calculate the correlations between parameter 1 and parameter 2 to N (C1_2, ..., C1_N). Here a 'parameter' is an array that contains scores for each movie in this parameter, and N is equal to the number of parameters. This tells you how much information in parameter 1 is also in the other parameters.
Now subtract this mutual information from parameter 1, by doing P1_NEW = P1 - P2*C1_2 - P3*C1_3 - ... - PN*C1_N
Do the same for all parameters, so that you have N new parameters: P1_NEW, P2_NEW, ..., PN_NEW.
You will find that if you calculate the correlations again, they won't be that close to zero yet. So you repeat the procedure until the correlations between all the parameters are pretty much zero.
Now you have N new parameters that are mutually independent, so their importance in the final score can be accurately judged by their correlations with your ratings. At least that's my theory . Let me know if there's a flaw in my reasoning .

One thing I should note is that removing correlation doesn't necessarily make the parameters independent and while what you're doing looks like a good approximation, something about it doesn't seem right to me, but it may just be that I'm not thinking this through again. Actually, the more I think about it I think it's a good approximation, just maybe not the best approach. If I have time I'll try to think about it and see if I can figure out what's bothering me.

Quicky wrote:
PeaceAnarchy wrote:Also, out of curiosity, would it be a lot of trouble to tell me which 224 films didn't make the cut?

Not at all: http://thequicky.net/files/search-movies.html

Please let me know if you find the new lists better and/or more useful and why. Thanks!

What does adding them involve? If it's just changing the links to point directly to the movies I'd be willing to do that when I have some time.

I do like the new list better, if for no other reason than that it gives me more recommendations to look at.

QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2

Re: QMDB Recommendations - Part 2