More updates with drafts (Slope One, modularity)
This commit is contained in:
parent
c7695799e6
commit
0437bb31cd
@ -11,25 +11,35 @@ Why are old technological ideas that were "ahead of their time", but
|
|||||||
which lost out to other ideas, worth studying?
|
which lost out to other ideas, worth studying?
|
||||||
|
|
||||||
We can see them as raw ideas that "modern" understanding never
|
We can see them as raw ideas that "modern" understanding never
|
||||||
refined - as misguided fantasies or just mistakes, even. The flip
|
refined - misguided fantasies or even just mistakes. The flip side of
|
||||||
side of this is that we can see them as ideas that are free of the
|
this is that we can see them as ideas that are free of a nearly
|
||||||
modern preconceptions that are now nearly inescapable.
|
inescapable modern context and all of the preconceptions and blinders
|
||||||
|
it carries.
|
||||||
|
|
||||||
In some of these visionaries is a valuable combination:
|
In some of these visionaries is a valuable combination:
|
||||||
|
|
||||||
- a detachment from this context (by mere virtue of it not existing
|
- they're detached from this modern context (by mere virtue of it not
|
||||||
yet),
|
existing yet),
|
||||||
- the ability to imagine and analyze far beyond the preconceptions
|
- they have considerable experience, imagination, and foresight,
|
||||||
that in turn surrounded them,
|
- they devoted time and effort to work extensively on something and to
|
||||||
- the resources and freedom to actually apply this,
|
communicate their thoughts, feelings, and analysis in a durable way.
|
||||||
- the foresight and sometimes blind luck to have communicated their
|
|
||||||
thoughts, feelings, and analysis in a durable way.
|
|
||||||
|
|
||||||
To put it in another way: They gave us analysis in a context that no
|
To put it in another way: They give us analysis done from a context
|
||||||
longer even exists. They help us think beyond our current context.
|
that is long gone. They help us think beyond our current context.
|
||||||
They help us answer a question, "What if we took a different path
|
They help us answer a question, "What if we took a different path
|
||||||
then?"
|
then?"
|
||||||
|
|
||||||
|
[[http://www.cs.yale.edu/homes/perlis-alan/quotes.html][Epigram #53]] from Alan Perlis offers some relevant skepticism here: "So
|
||||||
|
many good ideas are never heard from again once they embark in a
|
||||||
|
voyage on the semantic gulf." My interpretation of it is that we tend
|
||||||
|
to idolize ideas, old and new, because they sound somehow different,
|
||||||
|
innovative, and groundbreaking, but attempts at analysis or practical
|
||||||
|
realization of the ideas leads to a bleaker reality, perhaps that the
|
||||||
|
idea is completely meaningless (the equivalent of a [[https://en.wiktionary.org/wiki/deepity][deepity]], perhaps),
|
||||||
|
wildly impractical, or a mere facade over what is already established.
|
||||||
|
|
||||||
|
* Examples
|
||||||
|
|
||||||
* Scratch
|
* Scratch
|
||||||
|
|
||||||
- Douglas Engelbart is perhaps one of the canonical examples of a person
|
- Douglas Engelbart is perhaps one of the canonical examples of a person
|
||||||
@ -37,7 +47,4 @@ then?"
|
|||||||
another. Alan Turing is an early example widely regarded for his
|
another. Alan Turing is an early example widely regarded for his
|
||||||
foresight.
|
foresight.
|
||||||
- [[https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/][As We May Think (Vannevar Bush)]]
|
- [[https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/][As We May Think (Vannevar Bush)]]
|
||||||
- However, to quote [[http://www.cs.yale.edu/homes/perlis-alan/quotes.html][epigram #53]] from Alan Perlis, "So many good ideas
|
|
||||||
are never heard from again once they embark in a voyage on the
|
|
||||||
semantic gulf."
|
|
||||||
- "Do you remember a time when..." only goes so far.
|
- "Do you remember a time when..." only goes so far.
|
||||||
|
|||||||
@ -39,8 +39,8 @@ bits... It is not only necessary to make sure your own system is
|
|||||||
designed to be made of modular parts. It is also necessary to realize
|
designed to be made of modular parts. It is also necessary to realize
|
||||||
that your own system, no matter how big and wonderful it seems now,
|
that your own system, no matter how big and wonderful it seems now,
|
||||||
should always be designed to be a part of another larger system." Les
|
should always be designed to be a part of another larger system." Les
|
||||||
Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of
|
Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of future software]]
|
||||||
future software]] even did an interesting derivation tying the defect
|
even did an interesting derivation tying the defect
|
||||||
density in software to how it is broken into pieces.
|
density in software to how it is broken into pieces.
|
||||||
|
|
||||||
"Abstraction" doesn't have quite the same consensus. In software, it's
|
"Abstraction" doesn't have quite the same consensus. In software, it's
|
||||||
@ -255,3 +255,7 @@ underneath, and this makes me wonder why it needs explicit support for
|
|||||||
- https://www.reddit.com/r/programming/comments/4bjss2/an_11_line_npm_package_called_leftpad_with_only/
|
- https://www.reddit.com/r/programming/comments/4bjss2/an_11_line_npm_package_called_leftpad_with_only/
|
||||||
- http://www.freecode.com/articles/editorial-the-two-edged-sword
|
- http://www.freecode.com/articles/editorial-the-two-edged-sword
|
||||||
- https://en.wikipedia.org/wiki/Essential_complexity
|
- https://en.wikipedia.org/wiki/Essential_complexity
|
||||||
|
|
||||||
|
- GObject framework: an object system that sits outside of any
|
||||||
|
particular language (though this is nothing particularly new)
|
||||||
|
- libgreen
|
||||||
|
|||||||
@ -3,6 +3,8 @@
|
|||||||
#+DATE: December 12, 2017
|
#+DATE: December 12, 2017
|
||||||
#+TAGS: technobabble
|
#+TAGS: technobabble
|
||||||
|
|
||||||
|
I don't know if there's actually anything to write here.
|
||||||
|
|
||||||
There is a sort of parallel between the declarative nature of
|
There is a sort of parallel between the declarative nature of
|
||||||
computational graphs in TensorFlow, and functional programming
|
computational graphs in TensorFlow, and functional programming
|
||||||
(possibly function-level - think of the J language and how important
|
(possibly function-level - think of the J language and how important
|
||||||
@ -29,3 +31,12 @@ abstractions (RDD, DataFrame, Dataset)."
|
|||||||
|
|
||||||
Spark does this with a database. TensorFlow does it with numerical
|
Spark does this with a database. TensorFlow does it with numerical
|
||||||
calculations. Node-RED does it with irregular, asynchronous data.
|
calculations. Node-RED does it with irregular, asynchronous data.
|
||||||
|
|
||||||
|
- [[https://mxnet.incubator.apache.org/how_to/visualize_graph.html][mxnet: How to visualize Neural Networks as computation graph]]
|
||||||
|
- [[https://medium.com/intuitionmachine/pytorch-dynamic-computational-graphs-and-modular-deep-learning-7e7f89f18d1][PyTorch, Dynamic Computational Graphs and Modular Deep Learning]]
|
||||||
|
- [[https://github.com/WarBean/hyperboard][HyperBoard: A web-based dashboard for Deep Learning]]
|
||||||
|
- [[https://www.postgresql.org/docs/current/static/sql-explain.html][EXPLAIN in PostgreSQL]]
|
||||||
|
- http://tatiyants.com/postgres-query-plan-visualization/
|
||||||
|
- https://en.wikipedia.org/wiki/Dataflow_programming
|
||||||
|
- Pure Data!
|
||||||
|
- [[https://en.wikipedia.org/wiki/Orange_(software)][Orange]]?
|
||||||
|
|||||||
@ -21,8 +21,8 @@ references, and one particular [[https://github.com/fizyr/keras-retinanet][imple
|
|||||||
"Object detection" as it is used here refers to machine learning
|
"Object detection" as it is used here refers to machine learning
|
||||||
models that can not just identify a single object in an image, but can
|
models that can not just identify a single object in an image, but can
|
||||||
identify and *localize* multiple objects, like in the below photo
|
identify and *localize* multiple objects, like in the below photo
|
||||||
taken from [[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow
|
taken from
|
||||||
Object Detection API]]:
|
[[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow Object Detection API]]:
|
||||||
|
|
||||||
# TODO:
|
# TODO:
|
||||||
# Define mAP
|
# Define mAP
|
||||||
@ -143,10 +143,9 @@ explores). The paper is fairly concise in describing FPNs; it only
|
|||||||
takes it around 3 pages to explain their purpose, related work, and
|
takes it around 3 pages to explain their purpose, related work, and
|
||||||
their entire design. The remainder shows experimental results and
|
their entire design. The remainder shows experimental results and
|
||||||
specific applications of FPNs. While it shows FPNs implemented on a
|
specific applications of FPNs. While it shows FPNs implemented on a
|
||||||
particular underlying network (ResNet), they were made purposely to be
|
particular underlying network (ResNet, mentioned below), they were
|
||||||
very simple and adaptable to nearly any kind of CNN.
|
made purposely to be very simple and adaptable to nearly any kind of
|
||||||
|
CNN.
|
||||||
# TODO: Link to ResNet?
|
|
||||||
|
|
||||||
To begin understanding this, start with [[https://en.wikipedia.org/wiki/Pyramid_%2528image_processing%2529][image pyramids]]. The below
|
To begin understanding this, start with [[https://en.wikipedia.org/wiki/Pyramid_%2528image_processing%2529][image pyramids]]. The below
|
||||||
diagram illustrates an image pyramid:
|
diagram illustrates an image pyramid:
|
||||||
@ -225,6 +224,16 @@ connections.
|
|||||||
|
|
||||||
# Note C=256 and such
|
# Note C=256 and such
|
||||||
|
|
||||||
|
# TODO: Link to some good explanations
|
||||||
|
|
||||||
|
For two reasons, I don't explain much about ResNet here. The first is
|
||||||
|
that residual networks, like the ResNet used here, have seen lots of
|
||||||
|
attention and already have many good explanations online. The second
|
||||||
|
is that the paper claims that the underlying network
|
||||||
|
|
||||||
|
[[https://arxiv.org/abs/1512.03385][Deep Residual Learning for Image Recognition]]
|
||||||
|
[[https://arxiv.org/abs/1603.05027][Identity Mappings in Deep Residual Networks]]
|
||||||
|
|
||||||
* Anchors & Region Proposals
|
* Anchors & Region Proposals
|
||||||
|
|
||||||
Recall last section what was said about feature maps, and the that the
|
Recall last section what was said about feature maps, and the that the
|
||||||
@ -339,3 +348,21 @@ is implemented with bog-standard convolutional networks...
|
|||||||
* Inference
|
* Inference
|
||||||
|
|
||||||
# Top N results
|
# Top N results
|
||||||
|
|
||||||
|
* References
|
||||||
|
|
||||||
|
# Does org-mode have a way to make a special section for references?
|
||||||
|
# I know I saw this somewhere
|
||||||
|
|
||||||
|
1. [[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object Detection]]
|
||||||
|
2. [[https://arxiv.org/abs/1612.03144][Feature Pyramid Networks for Object Detection]]
|
||||||
|
3. [[https://arxiv.org/abs/1506.01497][Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]]
|
||||||
|
4. [[https://arxiv.org/abs/1504.08083][Fast R-CNN]]
|
||||||
|
5. [[https://arxiv.org/abs/1512.03385][Deep Residual Learning for Image Recognition]]
|
||||||
|
6. [[https://arxiv.org/abs/1603.05027][Identity Mappings in Deep Residual Networks]]
|
||||||
|
7. [[https://openreview.net/pdf?id%3DSJAr0QFxe][Demystifying ResNet]]
|
||||||
|
8. [[https://vision.cornell.edu/se3/wp-content/uploads/2016/10/nips_camera_ready_draft.pdf][Residual Networks Behave Like Ensembles of Relatively Shallow Networks]]
|
||||||
|
9. https://github.com/KaimingHe/deep-residual-networks
|
||||||
|
10. https://github.com/broadinstitute/keras-resnet (keras-retinanet uses this)
|
||||||
|
11. [[https://arxiv.org/abs/1311.2524][Rich feature hierarchies for accurate object detection and semantic segmentation]] (contains the same parametrization as in the Faster R-CNN paper)
|
||||||
|
12. http://deeplearning.csail.mit.edu/instance_ross.pdf and http://deeplearning.csail.mit.edu/
|
||||||
|
|||||||
@ -1,7 +1,13 @@
|
|||||||
#+TITLE: Collaborative Filtering with Slope One Predictors
|
---
|
||||||
#+AUTHOR: Chris Hodapp
|
title: Collaborative Filtering with Slope One Predictors
|
||||||
#+DATE: January 30, 2018
|
author: Chris Hodapp
|
||||||
#+TAGS: technobabble, machine learning
|
date: January 30, 2018
|
||||||
|
tags: technobabble, machine learning
|
||||||
|
---
|
||||||
|
|
||||||
|
# Needs a brief intro
|
||||||
|
|
||||||
|
# Needs a summary at the end
|
||||||
|
|
||||||
Suppose you have a large number of users, and a large number of
|
Suppose you have a large number of users, and a large number of
|
||||||
movies. Users have watched movies, and they've provided ratings for
|
movies. Users have watched movies, and they've provided ratings for
|
||||||
@ -10,61 +16,178 @@ However, they've all watched different movies, and for any given user,
|
|||||||
it's only a tiny fraction of the total movies.
|
it's only a tiny fraction of the total movies.
|
||||||
|
|
||||||
Now, you want to predict how some user will rate some movie they
|
Now, you want to predict how some user will rate some movie they
|
||||||
haven't rated, based on what they (and other users) have.
|
haven't rated, based on what they (and other users) have rated.
|
||||||
|
|
||||||
That's a common problem, especially when generalized from 'movies' to
|
That's a common problem, especially when generalized from 'movies' to
|
||||||
anything else, and one with many approaches.
|
anything else, and one with many approaches. (To put some technical
|
||||||
|
terms to it, this is the [[https://en.wikipedia.org/wiki/Collaborative_filtering][collaborative filtering]] approach to
|
||||||
|
[[https://en.wikipedia.org/wiki/Recommender_system][recommender systems]]. [[http://www.mmds.org/][Mining of Massive Datasets]] is an excellent free
|
||||||
|
text in which to read more in depth on this, particularly chapter 9.)
|
||||||
|
|
||||||
Slope One Predictors are one such method, described in the paper [[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope
|
Slope One Predictors are one such approach to collaborative filtering,
|
||||||
One Predictors for Online Rating-Based Collaborative Filtering]].
|
described in the paper [[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope One Predictors for Online Rating-Based
|
||||||
Despite the complex-sounding name, they are wonderfully simple to
|
Collaborative Filtering]]. Despite the complex-sounding name, they are
|
||||||
understand and implement, and very fast.
|
wonderfully simple to understand and implement, and very fast.
|
||||||
|
|
||||||
Consider a user Bob. Bob has rather simplistic tastes: he mostly just
|
I'll give a contrived example below to explain them.
|
||||||
watches Clint Eastwood movies. In fact, he's watched and rated nearly
|
|
||||||
all of them, and basically nothing else.
|
Consider a user Bob. Bob is enthusiastic, but has rather simple
|
||||||
|
tastes: he mostly just watches Clint Eastwood movies. In fact, he's
|
||||||
|
watched and rated nearly all of them, and basically nothing else.
|
||||||
|
|
||||||
Now, suppose we want to predict how much Bob will like something
|
Now, suppose we want to predict how much Bob will like something
|
||||||
completely different and unheard of (to him at least), like... I don't
|
completely different and unheard of (to him at least), like... I don't
|
||||||
know... /Citizen Kane/.
|
know... /Citizen Kane/.
|
||||||
|
|
||||||
First, find the users who rated both /Citizen Kane/ *and* any of the Clint
|
Here's Slope One in a nutshell:
|
||||||
Eastwood movies that Bob rated.
|
|
||||||
|
|
||||||
Now, for each movie that comes up above, compute a *deviation* which
|
1. First, find the users who rated both /Citizen Kane/ *and* any of
|
||||||
tells us: On average, how differently (i.e. how much higher or lower)
|
the Clint Eastwood movies that Bob rated.
|
||||||
did users rate Citizen Kane compared to this movie? (For instance,
|
2. Now, for each movie that comes up above, compute a *deviation*
|
||||||
we'll have a number for how /Citizen Kane/ was rated compared to
|
which tells us: On average, how differently (i.e. how much higher
|
||||||
/Dirty Harry/, and perhaps it's +0.6 - meaning that on average, users
|
or lower) did users rate Citizen Kane compared to this movie? (For
|
||||||
who rated both movies rated /Citizen Kane/ about 0.6 stars above
|
instance, we'll have a number for how /Citizen Kane/ was rated
|
||||||
/Dirty Harry/. We'd have another deviation for /Citizen Kane/
|
compared to /Dirty Harry/, and perhaps it's +0.6 - meaning that on
|
||||||
compared to /Gran Torino/, another for /Citizen Kane/ compared to /The
|
average, users who rated both movies rated /Citizen Kane/ about 0.6
|
||||||
Good, the Bad and the Ugly/, and so on - for every movie that Bob
|
stars above /Dirty Harry/. We'd have another deviation for
|
||||||
rated, provided that other users who rated /Citizen Kane/ also rated
|
/Citizen Kane/ compared to /Gran Torino/, another for /Citizen
|
||||||
the movie.)
|
Kane/ compared to /The Good, the Bad and the Ugly/, and so on - for
|
||||||
|
every movie that Bob rated, provided that other users who rated
|
||||||
|
/Citizen Kane/ also rated the movie.)
|
||||||
|
3. If that deviation between /Citizen Kane/ and /Dirty Harry/ was
|
||||||
|
+0.6, it's reasonable that adding 0.6 from Bob's rating on /Dirty
|
||||||
|
Harry/ would give one prediction of how Bob might rate /Citizen
|
||||||
|
Kane/. We can then generate more predictions based on the ratings
|
||||||
|
he gave the other movies - anything for which we could compute a
|
||||||
|
deviation.
|
||||||
|
4. To turn this to a single prediction, we could just average all
|
||||||
|
those predictions together.
|
||||||
|
|
||||||
If that deviation between /Citizen Kane/ and /Dirty Harry/ was +0.6,
|
One variant, Weighted Slope One, is nearly identical. The only
|
||||||
it's reasonable that adding 0.6 from Bob's rating on /Dirty Harry/
|
difference is in how we average those predictions in step #4. In
|
||||||
would give one prediction of how Bob might rate /Citizen Kane/. We
|
Slope One, every deviation counts equally, no matter how many users
|
||||||
can then generate more predictions based on the ratings he gave the
|
had differences in ratings averaged together to produce it. In
|
||||||
other movies - anything for which we could compute a deviation.
|
Weighted Slope One, deviations that came from larger numbers of users
|
||||||
|
count for more (because, presumably, they are better estimates).
|
||||||
To turn this to a single answer, we could just average those
|
|
||||||
predictions together.
|
|
||||||
|
|
||||||
That's the Slope One algorithm in a nutshell - and also the Weighted
|
|
||||||
Slope One algorithm. The only difference is in how we average those
|
|
||||||
predictions. In Slope One, every deviation counts equally, no matter
|
|
||||||
how many users had differences in ratings averaged together to produce
|
|
||||||
it. In Weighted Slope One, deviations that came from larger numbers
|
|
||||||
of users count for more (because, presumably, they are better
|
|
||||||
estimates).
|
|
||||||
|
|
||||||
Or, in other words: If only one person rated both /Citizen Kane/ and
|
Or, in other words: If only one person rated both /Citizen Kane/ and
|
||||||
the lesser-known Eastwood classic /Revenge of the Creature/, and they
|
the lesser-known Eastwood classic /Revenge of the Creature/, and they
|
||||||
happened to thank that /Revenge of the Creature/ deserved at least 3
|
happened to think that /Revenge of the Creature/ deserved at least 3
|
||||||
more stars, then with Slope One, this deviation of +3 would carry
|
more stars, then with Slope One, this deviation of -3 would carry
|
||||||
exactly as much weight as thousands of people rating /Citizen Kane/ as
|
exactly as much weight as thousands of people rating /Citizen Kane/ as
|
||||||
about 0.5 stars below /The Good, the Bad and the Ugly/. In Weighted
|
about 0.5 stars below /The Good, the Bad and the Ugly/. In Weighted
|
||||||
Slope One, that latter deviation would count for thousands of times as
|
Slope One, that latter deviation would count for thousands of times as
|
||||||
much. The example makes it sound a bit more drastic than it is.
|
much. The example makes it sound a bit more drastic than it is.
|
||||||
|
|
||||||
|
The Python library [[http://surpriselib.com/][Surprise]] (a [[https://www.scipy.org/scikits.html][scikit]]) has an implementation of this
|
||||||
|
algorithm, and the Benchmarks section of that page shows its
|
||||||
|
performance compared to some other methods.
|
||||||
|
|
||||||
|
/TODO/: Show a simple Python implementation of this (Jupyter
|
||||||
|
notebook?)
|
||||||
|
|
||||||
|
* Linear Algebra Tricks
|
||||||
|
|
||||||
|
Those who aren't familiar with matrix methods or algebra can probably
|
||||||
|
skip this section. Everything I've described above, you can compute
|
||||||
|
given just some data to work with ([[https://grouplens.org/datasets/movielens/100k/][movielens 100k]], perhaps?) and some
|
||||||
|
basic arithmetic. You don't need any complicated numerical methods.
|
||||||
|
|
||||||
|
However, the entire Slope One method can be implemented in a very fast
|
||||||
|
and simple way with a couple matrix operations.
|
||||||
|
|
||||||
|
First, we need to have our data encoded as a *utility matrix*. In a
|
||||||
|
utility matrix, each row represents one user, each column represents
|
||||||
|
one item (a movie, in our case), and each element represents a user's
|
||||||
|
rating of an item. If we have $n$ users and $m$ movies, then this a
|
||||||
|
$n \times m$ matrix $U$ for which $U_{k,i}$ is user $k$'s rating for
|
||||||
|
movie $i$ - assuming we've numbered our users and our movies.
|
||||||
|
|
||||||
|
Users have typically rated only a fraction of movies, and so most of
|
||||||
|
the elements of this matrix are unknown. We can represent this with
|
||||||
|
another $n \times m$ matrix (specifically a binary matrix), a 'mask'
|
||||||
|
$M$ in which $M_{k,i}$ is 1 if user $k$ supplied a rating for movie
|
||||||
|
$i$, and otherwise 0.
|
||||||
|
|
||||||
|
I mentioned *deviation* above and gave an informal definition of it.
|
||||||
|
The paper gaves a formal but rather terse definition below of the
|
||||||
|
average deviation of item $i$ with respect to item $j$:
|
||||||
|
|
||||||
|
$$\textrm{dev}_{j,i} = \sum_{u \in S_{j,i}(\chi)} \frac{u_j - u_i}{card(S_{j,i}(\chi))}$$
|
||||||
|
|
||||||
|
where:
|
||||||
|
- $u_j$ and $u_i$ mean: user $u$'s ratings for movies $i$ and $j$, respectively
|
||||||
|
- $u \in S_{j,i}(\chi)$ means: all users $u$ who, in the dataset we're
|
||||||
|
training on, provided a rating for both movie $i$ and movie $j$
|
||||||
|
- $card$ is the cardinality of that set, i.e. for
|
||||||
|
${card(S_{j,i}(\chi))}$ it is just how many users rated both $i$ and
|
||||||
|
$j$.
|
||||||
|
|
||||||
|
That denominator does depend on $i$ and $j$, but doesn't depend on the
|
||||||
|
summation term, so it can be pulled out, and also, we can split up the
|
||||||
|
summation as long as it is kept over the same terms:
|
||||||
|
|
||||||
|
$$\textrm{dev}_{j,i} = \frac{1}{card(S_{j,i}(\chi))} \sum_{u \in
|
||||||
|
S_{j,i}(\chi)} u_j - u_i = \frac{1}{card(S_{j,i}(\chi))}\left(\sum_{u
|
||||||
|
\in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i\right)$$
|
||||||
|
|
||||||
|
# TODO: These need some actual matrices to illustrate
|
||||||
|
|
||||||
|
Let's start with computing ${card(S_{j,i}(\chi))}$, the number of
|
||||||
|
users who rated both movie $i$ and movie $j$. Consider column $i$ of
|
||||||
|
the mask $M$. For each value in this column, it equals 1 if the
|
||||||
|
respective user rated movie $i$, or 0 if they did not. Clearly,
|
||||||
|
simply summing up column $i$ would tell us how many users rated movie
|
||||||
|
$i$, and the same applies to column $j$ for movie $j$.
|
||||||
|
|
||||||
|
Now, suppose we take element-wise logical AND of columns $i$ and $j$.
|
||||||
|
The resultant column has a 1 only where both corresponding elements
|
||||||
|
were 1 - where a user rated both $i$ and $j$. If we sum up this
|
||||||
|
column, we have exactly the number we need: the number of users who
|
||||||
|
rated both $i$ and $j$.
|
||||||
|
|
||||||
|
Some might notice that "elementwise logical AND" is just "elementwise
|
||||||
|
multiplication", thus "sum of elementwise logical AND" is just "sum of
|
||||||
|
elementwise multiplication", which is: dot product. That is,
|
||||||
|
${card(S_{j,i}(\chi))}=M_j \bullet M_i$ if we use $M_i$ and $M_j$ for
|
||||||
|
columns $i$ and $j$ of $M$.
|
||||||
|
|
||||||
|
However, we'd like to compute deviation as a matrix for all $i$ and
|
||||||
|
$j$, so we'll likewise need ${card(S_{j,i}(\chi))}$ for every single
|
||||||
|
combination of $i$ and $j$ - that is, we need a dot product between
|
||||||
|
every single pair of columns from $M$. Incidentally, "dot product of
|
||||||
|
every pair of columns" happens to be almost exactly matrix
|
||||||
|
multiplication; note that for matrices $A$ and $B$, element $(x,y)$ of
|
||||||
|
the matrix product $AB$ is just the dot product of /row/ $x$ of $A$
|
||||||
|
and /column/ $y$ of $B$ - and that matrix product as a whole has this
|
||||||
|
dot product between every row of $A$ and every column of $B$.
|
||||||
|
|
||||||
|
We wanted the dot product of every column of $M$ with every column of
|
||||||
|
$M$, which is easy: just transpose $M$ for one operand. Then, we can
|
||||||
|
compute our count matrix like this:
|
||||||
|
|
||||||
|
$$C=M^\top M$$
|
||||||
|
|
||||||
|
Thus $C_{i,j}$ is the dot product of column $i$ of $M$ and column $j$
|
||||||
|
of $M$ - or, the number of users who rated both movies $i$ and $j$.
|
||||||
|
|
||||||
|
That was the first half of what we needed for $\textrm{dev}_{j,i}$.
|
||||||
|
We still need the other half:
|
||||||
|
|
||||||
|
$$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i$$
|
||||||
|
|
||||||
|
We can apply a similar trick here. Consider first what $\sum_{u \in
|
||||||
|
S_{j,i}(\chi)} u_j$ means: It is the sum of only those ratings of
|
||||||
|
movie $j$ that were done by a user who also rated movie $i$.
|
||||||
|
Likewise, $\sum_{u \in S_{j,i}(\chi)} u_j$ is the sum of only those
|
||||||
|
ratings of movie $i$ that were done by a user who also rated movie
|
||||||
|
$j$. (Note the symmetry: it's over the same set of users, because
|
||||||
|
it's always the users who rated both $i$ and $j$.)
|
||||||
|
|
||||||
|
# TODO: Finish that section (mostly translate from code notes)
|
||||||
|
|
||||||
|
* Implementation
|
||||||
|
|
||||||
|
#+BEGIN_SRC python
|
||||||
|
print("foo")
|
||||||
|
#+END_SRC
|
||||||
|
|||||||
@ -16,7 +16,7 @@
|
|||||||
|
|
||||||
<!-- From http://travis.athougies.net/posts/2013-08-13-using-math-on-your-hakyll-blog.html -->
|
<!-- From http://travis.athougies.net/posts/2013-08-13-using-math-on-your-hakyll-blog.html -->
|
||||||
<script type="text/javascript"
|
<script type="text/javascript"
|
||||||
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
|
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<link rel="apple-touch-icon" sizes="57x57" href="/apple-touch-icon-57x57.png">
|
<link rel="apple-touch-icon" sizes="57x57" href="/apple-touch-icon-57x57.png">
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user