KGB wrote:So all in all, I think you should define 'Easy to find' pretty much as you would see the chances of finding a certein film in your average rental store: popcorn+age. Then again, there are certein classics, so I guess that if a film passes over, let's say, five important awards (that's sound more than enough for me) it is considered 'Easy to find'.
There will probably be some exceptions, but all in all, you should pretty much be covering it up.
PeaceAnarchy wrote:Personally I don't find the distinction between easy to find and not easy to find useful. If I'm looking for specific stuff I'll find it on netflix or order it or something, and if I'm looking for something on a whim I'll go with my huge mental list of stuff I need to watch. I use recommendations as a more long term thing.
Hmm... Ok. Thanks for pointing that out
. I've added an extra page to the list now which has all recommendations without splitting it up in 'easy' and 'not easy to find'. I'll put the better splitting algorithm on my to-do list
. The problem is that I don't have anything to compare it with, and I don't want to waste the time to check which movies out of a list of 3,500 my local rent stores have.
The updated lists for:
PeaceAnarchy wrote:Thanks. The list looks pretty good, except for the way it rates comedies, which seem consistently underrated.
I'm assuming you come to this observation because the movies in the 'Humor' section all had low percentages, right? In the new list I've made for you, it pulled movies for the genres sections from all movies and not just from the ones in the 'easy to find' section. So you'll notice now that the movies in the 'Humor' section all have larger scores, since they don't have to be
easy to find.
PeaceAnarchy wrote:How do you calculate PSI information for movies we've already seen?
Luckily these values are available in the source code of each movie web page on Criticker. They are not
visible when you've already seen the movie, but they
are there in the code. I hope Criticker will keep it that way, otherwise I'm in trouble
.
PeaceAnarchy wrote:Does your program allow you to recalculate ignoring certain parameters? I'd be curious as to what the results would look like without some of the lower correlation variables having an effect.
Not at the moment, but I certainly plan on making it easier to add or remove certain parameters to the mix, so that I can check for this kind of stuff.
PeaceAnarchy wrote:When you create your bell shaped curve are you trying to fit a specific shape? A 70% confidence interval with a 9% margin of error seems pretty high, and I think it might be due to having so many low correlation parameters.
The bell-shaped tier curve has no influence on how the movies are scored, so it is also of no relevance to the 9% accuracy. It is merely a way to give you a glimpse of the movies you've rated through every tier, where I take a bell-shaped tier system to emphasize my preference towards it in contrast to Criticker's 10% tier system
.
As for the 9% accuracy, I think this is actually quite good. There's a $1,000,000 competition going on--organized by Netflix--to establish a recommendations system for Netflix with an accuracy of 0.8563 points. Netflix rates from 1 to 5, so translating this error margin to Criticker's 0 to 100 system, you find that they try to establish an accuracy of 21.4%. So my system is doing twice as good (for you at least). Now, of course you have to keep in mind that on Criticker we can rate all the way from 0 to 100 in steps of 1, whereas on Netflix you only have 1 (0%), 2 (25%), 3 (50%), 4 (75%) and 5 (100%), so this probably has an influence on the accuracy.
PeaceAnarchy wrote:You mentioned a new method to make your parameters independent, could you go into that or send me link to what you've done?
I'm no mathematician, so my method might not be a 100% sound, but this is what I've done:
- Take parameter 1 (P1) and calculate the correlations between parameter 1 and parameter 2 to N (C1_2, ..., C1_N). Here a 'parameter' is an array that contains scores for each movie in this parameter, and N is equal to the number of parameters. This tells you how much information in parameter 1 is also in the other parameters.
- Now subtract this mutual information from parameter 1, by doing P1_NEW = P1 - P2*C1_2 - P3*C1_3 - ... - PN*C1_N
- Do the same for all parameters, so that you have N new parameters: P1_NEW, P2_NEW, ..., PN_NEW.
- You will find that if you calculate the correlations again, they won't be that close to zero yet. So you repeat the procedure until the correlations between all the parameters are pretty much zero.
Now you have N new parameters that are mutually independent, so their importance in the final score can be accurately judged by their correlations with your ratings. At least that's my theory
. Let me know if there's a flaw in my reasoning
.
PeaceAnarchy wrote:Also, out of curiosity, would it be a lot of trouble to tell me which 224 films didn't make the cut?
Not at all:
http://thequicky.net/files/search-movies.htmlPlease let me know if you find the new lists better and/or more useful and why. Thanks!