PeaceAnarchy wrote:It's just that intuitively +/-9% means a predicted score of 75 will be between 66 and 84 only 70% of the time which doesn't seem all that accurate.
Indeed. However, I'm quite confident that there are very few other computerized recommendation systems that are more accurate than this. Relying on nothing but numbers (well... almost nothing but numbers) is inherently inaccurate. I use a Root Mean Squared Error (RMSE) to calculate the error, which means it puts more emphasis on outliers.
PeaceAnarchy wrote:One thing I should note is that removing correlation doesn't necessarily make the parameters independent and while what you're doing looks like a good approximation, something about it doesn't seem right to me, but it may just be that I'm not thinking this through again. Actually, the more I think about it I think it's a good approximation, just maybe not the best approach. If I have time I'll try to think about it and see if I can figure out what's bothering me.
Yeah, now that I think of it again, removing the correlations does indeed not imply that there are no dependencies left. You're completely right. I still think it's a decent approximation to replace my brute force method which was a little more accurate and more useful probably, but a 1000 times slower. I'd love to find a better algorithm though...
Quicky wrote:What does adding them involve? If it's just changing the links to point directly to the movies I'd be willing to do that when I have some time.
What I need are the codes that uniquely identify the movie on each of the three websites I use (criticker.com, imdb.com, boxofficemojo.com). For Criticker and IMDB these are numbers, for BoxOfficeMojo it's a string. The webpage I uploaded for you contains links that should almost point at the movies on the three websites. I don't really need the links, but I need the identifiers that uniquely define the links so that I can put them in my Excel database.
So for example, for the second movie of that list, this is what I would have to put in my Excel file:
- "0811106" (taken from http://www.imdb.com/title/tt0811106)
- "15423" (taken from http://www.criticker.com/?f=15423)
- "ten" (taken from http://boxofficemojo.com/movies/?id=ten.htm)
This is what it should preferably look like when you give me the list:
Code: Select all
0113820 5553 powerrangers
0193364 9329 -
0424774 3720 -
0367652 3729 deucebigalow2
0103923 8667 -
0373908 1419 honeymooners
0396592 1050 fatalbert
0105477 5576 stopormymomwillshoot
0308208 226 ecksvssever
0118615 122 anaconda
0140796 9256 airbud2