71 lines
3.4 KiB
Org Mode
71 lines
3.4 KiB
Org Mode
#+TITLE: Collaborative Filtering with Slope One Predictors
|
|
#+AUTHOR: Chris Hodapp
|
|
#+DATE: January 30, 2018
|
|
#+TAGS: technobabble, machine learning
|
|
|
|
Suppose you have a large number of users, and a large number of
|
|
movies. Users have watched movies, and they've provided ratings for
|
|
some of them (perhaps just simple numerical ratings, 1 to 10 stars).
|
|
However, they've all watched different movies, and for any given user,
|
|
it's only a tiny fraction of the total movies.
|
|
|
|
Now, you want to predict how some user will rate some movie they
|
|
haven't rated, based on what they (and other users) have.
|
|
|
|
That's a common problem, especially when generalized from 'movies' to
|
|
anything else, and one with many approaches.
|
|
|
|
Slope One Predictors are one such method, described in the paper [[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope
|
|
One Predictors for Online Rating-Based Collaborative Filtering]].
|
|
Despite the complex-sounding name, they are wonderfully simple to
|
|
understand and implement, and very fast.
|
|
|
|
Consider a user Bob. Bob has rather simplistic tastes: he mostly just
|
|
watches Clint Eastwood movies. In fact, he's watched and rated nearly
|
|
all of them, and basically nothing else.
|
|
|
|
Now, suppose we want to predict how much Bob will like something
|
|
completely different and unheard of (to him at least), like... I don't
|
|
know... /Citizen Kane/.
|
|
|
|
First, find the users who rated both /Citizen Kane/ *and* any of the Clint
|
|
Eastwood movies that Bob rated.
|
|
|
|
Now, for each movie that comes up above, compute a *deviation* which
|
|
tells us: On average, how differently (i.e. how much higher or lower)
|
|
did users rate Citizen Kane compared to this movie? (For instance,
|
|
we'll have a number for how /Citizen Kane/ was rated compared to
|
|
/Dirty Harry/, and perhaps it's +0.6 - meaning that on average, users
|
|
who rated both movies rated /Citizen Kane/ about 0.6 stars above
|
|
/Dirty Harry/. We'd have another deviation for /Citizen Kane/
|
|
compared to /Gran Torino/, another for /Citizen Kane/ compared to /The
|
|
Good, the Bad and the Ugly/, and so on - for every movie that Bob
|
|
rated, provided that other users who rated /Citizen Kane/ also rated
|
|
the movie.)
|
|
|
|
If that deviation between /Citizen Kane/ and /Dirty Harry/ was +0.6,
|
|
it's reasonable that adding 0.6 from Bob's rating on /Dirty Harry/
|
|
would give one prediction of how Bob might rate /Citizen Kane/. We
|
|
can then generate more predictions based on the ratings he gave the
|
|
other movies - anything for which we could compute a deviation.
|
|
|
|
To turn this to a single answer, we could just average those
|
|
predictions together.
|
|
|
|
That's the Slope One algorithm in a nutshell - and also the Weighted
|
|
Slope One algorithm. The only difference is in how we average those
|
|
predictions. In Slope One, every deviation counts equally, no matter
|
|
how many users had differences in ratings averaged together to produce
|
|
it. In Weighted Slope One, deviations that came from larger numbers
|
|
of users count for more (because, presumably, they are better
|
|
estimates).
|
|
|
|
Or, in other words: If only one person rated both /Citizen Kane/ and
|
|
the lesser-known Eastwood classic /Revenge of the Creature/, and they
|
|
happened to thank that /Revenge of the Creature/ deserved at least 3
|
|
more stars, then with Slope One, this deviation of +3 would carry
|
|
exactly as much weight as thousands of people rating /Citizen Kane/ as
|
|
about 0.5 stars below /The Good, the Bad and the Ugly/. In Weighted
|
|
Slope One, that latter deviation would count for thousands of times as
|
|
much. The example makes it sound a bit more drastic than it is.
|