50 lines
2.4 KiB
Org Mode
50 lines
2.4 KiB
Org Mode
#+TITLE: Collaborative Filtering with Slope One Predictors
|
|
#+AUTHOR: Chris Hodapp
|
|
#+DATE: January 30, 2018
|
|
#+TAGS: technobabble, machine learning
|
|
|
|
[[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope One Predictors for Online Rating-Based Collaborative Filtering]]
|
|
|
|
The way this works is remarkably simple. I'll concoct a really
|
|
contrived example here to explain it.
|
|
|
|
Suppose you have a large number of users, and a large number of
|
|
movies. Users have watched movies, and they've provided ratings for
|
|
some of them (perhaps just simple numerical ratings, 1 to 10 stars).
|
|
However, they've all watched different movies, and for any given user,
|
|
it's only a tiny fraction of the total movies.
|
|
|
|
Consider a user Bob. Bob has rather simplistic tastes: he mostly just
|
|
watches Clint Eastwood movies. In fact, he's watched and rated nearly
|
|
all of them, and basically nothing else.
|
|
|
|
Now, suppose we want to predict how much Bob will like something
|
|
completely different and unheard of (to him at least), like... I don't
|
|
know... /Citizen Kane/. How would we go about this?
|
|
|
|
The Slope One algorithm does it as follows:
|
|
|
|
1. Find the users who rated both /Citizen Kane/ *and* any of the Clint
|
|
Eastwood movies that Bob rated.
|
|
2. For each movie that comes up above, compute a number that tells us:
|
|
On average, how differently (i.e. how much higher or lower) did
|
|
users rate Citizen Kane compared to this movie? (That is, we'll
|
|
have a number for how /Citizen Kane/ was rated compared to /Dirty
|
|
Harry/, another for /Citizen Kane/ compared to /Gran Torino/,
|
|
another for /Citizen Kane/ compared to /The Good, the Bad and the
|
|
Ugly/, and so on - for everything that Bob rated, and that someone
|
|
else who rated /Citizen Kane/ also rated.)
|
|
3. Of course, Bob rated all of these movies - so we can take his
|
|
rating for each of these, and add each respective number above to
|
|
'correct' it and produce an estimate of how he might rate /Citizen
|
|
Kane/ based on just his rating for another movie.
|
|
4. Average together all of these predicted ratings.
|
|
|
|
A variant of it, Weighted Slope One, makes one small modification: In
|
|
step #4 it turned that average into a weighted average that takes into
|
|
account how many ratings are involved for each pair of movies.
|
|
|
|
As an example, if only one person rated both /Citizen Kane/ and the
|
|
Eastwood classic /Revenge of the Creature/, the Slope One algorithm
|
|
would assign this prediction equal "votes"
|