From c52687dd40ea0a2f1e348ae674cfce9b92504258 Mon Sep 17 00:00:00 2001 From: Chris Hodapp Date: Tue, 30 Jan 2018 21:10:38 -0500 Subject: [PATCH] First attempt at a "Slope One" post --- drafts/2018-01-30-slope-one.org | 49 +++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 drafts/2018-01-30-slope-one.org diff --git a/drafts/2018-01-30-slope-one.org b/drafts/2018-01-30-slope-one.org new file mode 100644 index 0000000..104ef44 --- /dev/null +++ b/drafts/2018-01-30-slope-one.org @@ -0,0 +1,49 @@ +#+TITLE: Collaborative Filtering with Slope One Predictors +#+AUTHOR: Chris Hodapp +#+DATE: January 30, 2018 +#+TAGS: technobabble, machine learning + +[[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope One Predictors for Online Rating-Based Collaborative Filtering]] + +The way this works is remarkably simple. I'll concoct a really +contrived example here to explain it. + +Suppose you have a large number of users, and a large number of +movies. Users have watched movies, and they've provided ratings for +some of them (perhaps just simple numerical ratings, 1 to 10 stars). +However, they've all watched different movies, and for any given user, +it's only a tiny fraction of the total movies. + +Consider a user Bob. Bob has rather simplistic tastes: he mostly just +watches Clint Eastwood movies. In fact, he's watched and rated nearly +all of them, and basically nothing else. + +Now, suppose we want to predict how much Bob will like something +completely different and unheard of (to him at least), like... I don't +know... /Citizen Kane/. How would we go about this? + +The Slope One algorithm does it as follows: + +1. Find the users who rated both /Citizen Kane/ *and* any of the Clint + Eastwood movies that Bob rated. +2. For each movie that comes up above, compute a number that tells us: + On average, how differently (i.e. how much higher or lower) did + users rate Citizen Kane compared to this movie? (That is, we'll + have a number for how /Citizen Kane/ was rated compared to /Dirty + Harry/, another for /Citizen Kane/ compared to /Gran Torino/, + another for /Citizen Kane/ compared to /The Good, the Bad and the + Ugly/, and so on - for everything that Bob rated, and that someone + else who rated /Citizen Kane/ also rated.) +3. Of course, Bob rated all of these movies - so we can take his + rating for each of these, and add each respective number above to + 'correct' it and produce an estimate of how he might rate /Citizen + Kane/ based on just his rating for another movie. +4. Average together all of these predicted ratings. + +A variant of it, Weighted Slope One, makes one small modification: In +step #4 it turned that average into a weighted average that takes into +account how many ratings are involved for each pair of movies. + +As an example, if only one person rated both /Citizen Kane/ and the +Eastwood classic /Revenge of the Creature/, the Slope One algorithm +would assign this prediction equal "votes"