blag/drafts/2018-01-30-slope-one.org

#+TITLE: Collaborative Filtering with Slope One Predictors
#+AUTHOR: Chris Hodapp
#+DATE: January 30, 2018
#+TAGS: technobabble, machine learning

[[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope One Predictors for Online Rating-Based Collaborative Filtering]]

The way this works is remarkably simple.  I'll concoct a really
contrived example here to explain it.

Suppose you have a large number of users, and a large number of
movies.  Users have watched movies, and they've provided ratings for
some of them (perhaps just simple numerical ratings, 1 to 10 stars).
However, they've all watched different movies, and for any given user,
it's only a tiny fraction of the total movies.

Consider a user Bob.  Bob has rather simplistic tastes: he mostly just
watches Clint Eastwood movies.  In fact, he's watched and rated nearly
all of them, and basically nothing else.

Now, suppose we want to predict how much Bob will like something
completely different and unheard of (to him at least), like... I don't
know... /Citizen Kane/.  How would we go about this?

The Slope One algorithm does it as follows:

1. Find the users who rated both /Citizen Kane/ *and* any of the Clint
   Eastwood movies that Bob rated.
2. For each movie that comes up above, compute a number that tells us:
   On average, how differently (i.e. how much higher or lower) did
   users rate Citizen Kane compared to this movie?  (That is, we'll
   have a number for how /Citizen Kane/ was rated compared to /Dirty
   Harry/, another for /Citizen Kane/ compared to /Gran Torino/,
   another for /Citizen Kane/ compared to /The Good, the Bad and the
   Ugly/, and so on - for everything that Bob rated, and that someone
   else who rated /Citizen Kane/ also rated.)
3. Of course, Bob rated all of these movies - so we can take his
   rating for each of these, and add each respective number above to
   'correct' it and produce an estimate of how he might rate /Citizen
   Kane/ based on just his rating for another movie.
4. Average together all of these predicted ratings.

A variant of it, Weighted Slope One, makes one small modification: In
step #4 it turned that average into a weighted average that takes into
account how many ratings are involved for each pair of movies.

As an example, if only one person rated both /Citizen Kane/ and the
Eastwood classic /Revenge of the Creature/, the Slope One algorithm
would assign this prediction equal "votes"