blag/drafts/2018-01-30-slope-one.org

#+TITLE: Collaborative Filtering with Slope One Predictors
#+AUTHOR: Chris Hodapp
#+DATE: January 30, 2018
#+TAGS: technobabble, machine learning

Suppose you have a large number of users, and a large number of
movies.  Users have watched movies, and they've provided ratings for
some of them (perhaps just simple numerical ratings, 1 to 10 stars).
However, they've all watched different movies, and for any given user,
it's only a tiny fraction of the total movies.

Now, you want to predict how some user will rate some movie they
haven't rated, based on what they (and other users) have.

That's a common problem, especially when generalized from 'movies' to
anything else, and one with many approaches.

Slope One Predictors are one such method, described in the paper [[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope
One Predictors for Online Rating-Based Collaborative Filtering]].
Despite the complex-sounding name, they are wonderfully simple to
understand and implement, and very fast.

Consider a user Bob.  Bob has rather simplistic tastes: he mostly just
watches Clint Eastwood movies.  In fact, he's watched and rated nearly
all of them, and basically nothing else.

Now, suppose we want to predict how much Bob will like something
completely different and unheard of (to him at least), like... I don't
know... /Citizen Kane/.

First, find the users who rated both /Citizen Kane/ *and* any of the Clint
Eastwood movies that Bob rated.

Now, for each movie that comes up above, compute a *deviation* which
tells us: On average, how differently (i.e. how much higher or lower)
did users rate Citizen Kane compared to this movie?  (For instance,
we'll have a number for how /Citizen Kane/ was rated compared to
/Dirty Harry/, and perhaps it's +0.6 - meaning that on average, users
who rated both movies rated /Citizen Kane/ about 0.6 stars above
/Dirty Harry/.  We'd have another deviation for /Citizen Kane/
compared to /Gran Torino/, another for /Citizen Kane/ compared to /The
Good, the Bad and the Ugly/, and so on - for every movie that Bob
rated, provided that other users who rated /Citizen Kane/ also rated
the movie.)

If that deviation between /Citizen Kane/ and /Dirty Harry/ was +0.6,
it's reasonable that adding 0.6 from Bob's rating on /Dirty Harry/
would give one prediction of how Bob might rate /Citizen Kane/.  We
can then generate more predictions based on the ratings he gave the
other movies - anything for which we could compute a deviation.

To turn this to a single answer, we could just average those
predictions together.

That's the Slope One algorithm in a nutshell - and also the Weighted
Slope One algorithm.  The only difference is in how we average those
predictions.  In Slope One, every deviation counts equally, no matter
how many users had differences in ratings averaged together to produce
it.  In Weighted Slope One, deviations that came from larger numbers
of users count for more (because, presumably, they are better
estimates).

Or, in other words: If only one person rated both /Citizen Kane/ and
the lesser-known Eastwood classic /Revenge of the Creature/, and they
happened to thank that /Revenge of the Creature/ deserved at least 3
more stars, then with Slope One, this deviation of +3 would carry
exactly as much weight as thousands of people rating /Citizen Kane/ as
about 0.5 stars below /The Good, the Bad and the Ugly/.  In Weighted
Slope One, that latter deviation would count for thousands of times as
much.  The example makes it sound a bit more drastic than it is.