Reworded some stuff in Slope One post
This commit is contained in:
parent
c52687dd40
commit
c7695799e6
@ -3,47 +3,68 @@
|
|||||||
#+DATE: January 30, 2018
|
#+DATE: January 30, 2018
|
||||||
#+TAGS: technobabble, machine learning
|
#+TAGS: technobabble, machine learning
|
||||||
|
|
||||||
[[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope One Predictors for Online Rating-Based Collaborative Filtering]]
|
|
||||||
|
|
||||||
The way this works is remarkably simple. I'll concoct a really
|
|
||||||
contrived example here to explain it.
|
|
||||||
|
|
||||||
Suppose you have a large number of users, and a large number of
|
Suppose you have a large number of users, and a large number of
|
||||||
movies. Users have watched movies, and they've provided ratings for
|
movies. Users have watched movies, and they've provided ratings for
|
||||||
some of them (perhaps just simple numerical ratings, 1 to 10 stars).
|
some of them (perhaps just simple numerical ratings, 1 to 10 stars).
|
||||||
However, they've all watched different movies, and for any given user,
|
However, they've all watched different movies, and for any given user,
|
||||||
it's only a tiny fraction of the total movies.
|
it's only a tiny fraction of the total movies.
|
||||||
|
|
||||||
|
Now, you want to predict how some user will rate some movie they
|
||||||
|
haven't rated, based on what they (and other users) have.
|
||||||
|
|
||||||
|
That's a common problem, especially when generalized from 'movies' to
|
||||||
|
anything else, and one with many approaches.
|
||||||
|
|
||||||
|
Slope One Predictors are one such method, described in the paper [[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope
|
||||||
|
One Predictors for Online Rating-Based Collaborative Filtering]].
|
||||||
|
Despite the complex-sounding name, they are wonderfully simple to
|
||||||
|
understand and implement, and very fast.
|
||||||
|
|
||||||
Consider a user Bob. Bob has rather simplistic tastes: he mostly just
|
Consider a user Bob. Bob has rather simplistic tastes: he mostly just
|
||||||
watches Clint Eastwood movies. In fact, he's watched and rated nearly
|
watches Clint Eastwood movies. In fact, he's watched and rated nearly
|
||||||
all of them, and basically nothing else.
|
all of them, and basically nothing else.
|
||||||
|
|
||||||
Now, suppose we want to predict how much Bob will like something
|
Now, suppose we want to predict how much Bob will like something
|
||||||
completely different and unheard of (to him at least), like... I don't
|
completely different and unheard of (to him at least), like... I don't
|
||||||
know... /Citizen Kane/. How would we go about this?
|
know... /Citizen Kane/.
|
||||||
|
|
||||||
The Slope One algorithm does it as follows:
|
First, find the users who rated both /Citizen Kane/ *and* any of the Clint
|
||||||
|
Eastwood movies that Bob rated.
|
||||||
|
|
||||||
1. Find the users who rated both /Citizen Kane/ *and* any of the Clint
|
Now, for each movie that comes up above, compute a *deviation* which
|
||||||
Eastwood movies that Bob rated.
|
tells us: On average, how differently (i.e. how much higher or lower)
|
||||||
2. For each movie that comes up above, compute a number that tells us:
|
did users rate Citizen Kane compared to this movie? (For instance,
|
||||||
On average, how differently (i.e. how much higher or lower) did
|
we'll have a number for how /Citizen Kane/ was rated compared to
|
||||||
users rate Citizen Kane compared to this movie? (That is, we'll
|
/Dirty Harry/, and perhaps it's +0.6 - meaning that on average, users
|
||||||
have a number for how /Citizen Kane/ was rated compared to /Dirty
|
who rated both movies rated /Citizen Kane/ about 0.6 stars above
|
||||||
Harry/, another for /Citizen Kane/ compared to /Gran Torino/,
|
/Dirty Harry/. We'd have another deviation for /Citizen Kane/
|
||||||
another for /Citizen Kane/ compared to /The Good, the Bad and the
|
compared to /Gran Torino/, another for /Citizen Kane/ compared to /The
|
||||||
Ugly/, and so on - for everything that Bob rated, and that someone
|
Good, the Bad and the Ugly/, and so on - for every movie that Bob
|
||||||
else who rated /Citizen Kane/ also rated.)
|
rated, provided that other users who rated /Citizen Kane/ also rated
|
||||||
3. Of course, Bob rated all of these movies - so we can take his
|
the movie.)
|
||||||
rating for each of these, and add each respective number above to
|
|
||||||
'correct' it and produce an estimate of how he might rate /Citizen
|
|
||||||
Kane/ based on just his rating for another movie.
|
|
||||||
4. Average together all of these predicted ratings.
|
|
||||||
|
|
||||||
A variant of it, Weighted Slope One, makes one small modification: In
|
If that deviation between /Citizen Kane/ and /Dirty Harry/ was +0.6,
|
||||||
step #4 it turned that average into a weighted average that takes into
|
it's reasonable that adding 0.6 from Bob's rating on /Dirty Harry/
|
||||||
account how many ratings are involved for each pair of movies.
|
would give one prediction of how Bob might rate /Citizen Kane/. We
|
||||||
|
can then generate more predictions based on the ratings he gave the
|
||||||
|
other movies - anything for which we could compute a deviation.
|
||||||
|
|
||||||
As an example, if only one person rated both /Citizen Kane/ and the
|
To turn this to a single answer, we could just average those
|
||||||
Eastwood classic /Revenge of the Creature/, the Slope One algorithm
|
predictions together.
|
||||||
would assign this prediction equal "votes"
|
|
||||||
|
That's the Slope One algorithm in a nutshell - and also the Weighted
|
||||||
|
Slope One algorithm. The only difference is in how we average those
|
||||||
|
predictions. In Slope One, every deviation counts equally, no matter
|
||||||
|
how many users had differences in ratings averaged together to produce
|
||||||
|
it. In Weighted Slope One, deviations that came from larger numbers
|
||||||
|
of users count for more (because, presumably, they are better
|
||||||
|
estimates).
|
||||||
|
|
||||||
|
Or, in other words: If only one person rated both /Citizen Kane/ and
|
||||||
|
the lesser-known Eastwood classic /Revenge of the Creature/, and they
|
||||||
|
happened to thank that /Revenge of the Creature/ deserved at least 3
|
||||||
|
more stars, then with Slope One, this deviation of +3 would carry
|
||||||
|
exactly as much weight as thousands of people rating /Citizen Kane/ as
|
||||||
|
about 0.5 stars below /The Good, the Bad and the Ugly/. In Weighted
|
||||||
|
Slope One, that latter deviation would count for thousands of times as
|
||||||
|
much. The example makes it sound a bit more drastic than it is.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user