Begin slow migration to Hugo...
This commit is contained in:
@@ -1,42 +0,0 @@
|
||||
#+TITLE: Dataflow paradigm (working title)
|
||||
#+AUTHOR: Chris Hodapp
|
||||
#+DATE: December 12, 2017
|
||||
#+TAGS: technobabble
|
||||
|
||||
I don't know if there's actually anything to write here.
|
||||
|
||||
There is a sort of parallel between the declarative nature of
|
||||
computational graphs in TensorFlow, and functional programming
|
||||
(possibly function-level - think of the J language and how important
|
||||
rank is to its computations).
|
||||
|
||||
Apache Spark and TensorFlow are very similar in a lot of ways. The
|
||||
key difference I see is that Spark handles different types of data
|
||||
internally that are more suited to databases, reords, tables, and
|
||||
generally relational data, while TensorFlow is, well, tensors
|
||||
(arbitrary-dimensional arrays).
|
||||
|
||||
The interesting part to me with both of these is how they've moved
|
||||
"bulk" computations into first-class objects (ish) and permitted some
|
||||
level of introspection into them before they run, as they run, and
|
||||
after they run. Like I noted in Notes - Paper, 2016-11-13, "One
|
||||
interesting (to me) facet is how the computation process has been
|
||||
split out and instrumented enough to allow some meaningful
|
||||
introspection with it. It hasn't precisely made it a first-class
|
||||
construct, but still, this feature pervades all of Spark's major
|
||||
abstractions (RDD, DataFrame, Dataset)."
|
||||
|
||||
# Show Tensorboard example here
|
||||
# Screenshots may be a good idea too
|
||||
|
||||
Spark does this with a database. TensorFlow does it with numerical
|
||||
calculations. Node-RED does it with irregular, asynchronous data.
|
||||
|
||||
- [[https://mxnet.incubator.apache.org/how_to/visualize_graph.html][mxnet: How to visualize Neural Networks as computation graph]]
|
||||
- [[https://medium.com/intuitionmachine/pytorch-dynamic-computational-graphs-and-modular-deep-learning-7e7f89f18d1][PyTorch, Dynamic Computational Graphs and Modular Deep Learning]]
|
||||
- [[https://github.com/WarBean/hyperboard][HyperBoard: A web-based dashboard for Deep Learning]]
|
||||
- [[https://www.postgresql.org/docs/current/static/sql-explain.html][EXPLAIN in PostgreSQL]]
|
||||
- http://tatiyants.com/postgres-query-plan-visualization/
|
||||
- https://en.wikipedia.org/wiki/Dataflow_programming
|
||||
- Pure Data!
|
||||
- [[https://en.wikipedia.org/wiki/Orange_(software)][Orange]]?
|
||||
@@ -1,193 +0,0 @@
|
||||
---
|
||||
title: Collaborative Filtering with Slope One Predictors
|
||||
author: Chris Hodapp
|
||||
date: January 30, 2018
|
||||
tags: technobabble, machine learning
|
||||
---
|
||||
|
||||
# Needs a brief intro
|
||||
|
||||
# Needs a summary at the end
|
||||
|
||||
Suppose you have a large number of users, and a large number of
|
||||
movies. Users have watched movies, and they've provided ratings for
|
||||
some of them (perhaps just simple numerical ratings, 1 to 10 stars).
|
||||
However, they've all watched different movies, and for any given user,
|
||||
it's only a tiny fraction of the total movies.
|
||||
|
||||
Now, you want to predict how some user will rate some movie they
|
||||
haven't rated, based on what they (and other users) have rated.
|
||||
|
||||
That's a common problem, especially when generalized from 'movies' to
|
||||
anything else, and one with many approaches. (To put some technical
|
||||
terms to it, this is the [[https://en.wikipedia.org/wiki/Collaborative_filtering][collaborative filtering]] approach to
|
||||
[[https://en.wikipedia.org/wiki/Recommender_system][recommender systems]]. [[http://www.mmds.org/][Mining of Massive Datasets]] is an excellent free
|
||||
text in which to read more in depth on this, particularly chapter 9.)
|
||||
|
||||
Slope One Predictors are one such approach to collaborative filtering,
|
||||
described in the paper [[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope One Predictors for Online Rating-Based
|
||||
Collaborative Filtering]]. Despite the complex-sounding name, they are
|
||||
wonderfully simple to understand and implement, and very fast.
|
||||
|
||||
I'll give a contrived example below to explain them.
|
||||
|
||||
Consider a user Bob. Bob is enthusiastic, but has rather simple
|
||||
tastes: he mostly just watches Clint Eastwood movies. In fact, he's
|
||||
watched and rated nearly all of them, and basically nothing else.
|
||||
|
||||
Now, suppose we want to predict how much Bob will like something
|
||||
completely different and unheard of (to him at least), like... I don't
|
||||
know... /Citizen Kane/.
|
||||
|
||||
Here's Slope One in a nutshell:
|
||||
|
||||
1. First, find the users who rated both /Citizen Kane/ *and* any of
|
||||
the Clint Eastwood movies that Bob rated.
|
||||
2. Now, for each movie that comes up above, compute a *deviation*
|
||||
which tells us: On average, how differently (i.e. how much higher
|
||||
or lower) did users rate Citizen Kane compared to this movie? (For
|
||||
instance, we'll have a number for how /Citizen Kane/ was rated
|
||||
compared to /Dirty Harry/, and perhaps it's +0.6 - meaning that on
|
||||
average, users who rated both movies rated /Citizen Kane/ about 0.6
|
||||
stars above /Dirty Harry/. We'd have another deviation for
|
||||
/Citizen Kane/ compared to /Gran Torino/, another for /Citizen
|
||||
Kane/ compared to /The Good, the Bad and the Ugly/, and so on - for
|
||||
every movie that Bob rated, provided that other users who rated
|
||||
/Citizen Kane/ also rated the movie.)
|
||||
3. If that deviation between /Citizen Kane/ and /Dirty Harry/ was
|
||||
+0.6, it's reasonable that adding 0.6 from Bob's rating on /Dirty
|
||||
Harry/ would give one prediction of how Bob might rate /Citizen
|
||||
Kane/. We can then generate more predictions based on the ratings
|
||||
he gave the other movies - anything for which we could compute a
|
||||
deviation.
|
||||
4. To turn this to a single prediction, we could just average all
|
||||
those predictions together.
|
||||
|
||||
One variant, Weighted Slope One, is nearly identical. The only
|
||||
difference is in how we average those predictions in step #4. In
|
||||
Slope One, every deviation counts equally, no matter how many users
|
||||
had differences in ratings averaged together to produce it. In
|
||||
Weighted Slope One, deviations that came from larger numbers of users
|
||||
count for more (because, presumably, they are better estimates).
|
||||
|
||||
Or, in other words: If only one person rated both /Citizen Kane/ and
|
||||
the lesser-known Eastwood classic /Revenge of the Creature/, and they
|
||||
happened to think that /Revenge of the Creature/ deserved at least 3
|
||||
more stars, then with Slope One, this deviation of -3 would carry
|
||||
exactly as much weight as thousands of people rating /Citizen Kane/ as
|
||||
about 0.5 stars below /The Good, the Bad and the Ugly/. In Weighted
|
||||
Slope One, that latter deviation would count for thousands of times as
|
||||
much. The example makes it sound a bit more drastic than it is.
|
||||
|
||||
The Python library [[http://surpriselib.com/][Surprise]] (a [[https://www.scipy.org/scikits.html][scikit]]) has an implementation of this
|
||||
algorithm, and the Benchmarks section of that page shows its
|
||||
performance compared to some other methods.
|
||||
|
||||
/TODO/: Show a simple Python implementation of this (Jupyter
|
||||
notebook?)
|
||||
|
||||
* Linear Algebra Tricks
|
||||
|
||||
Those who aren't familiar with matrix methods or algebra can probably
|
||||
skip this section. Everything I've described above, you can compute
|
||||
given just some data to work with ([[https://grouplens.org/datasets/movielens/100k/][movielens 100k]], perhaps?) and some
|
||||
basic arithmetic. You don't need any complicated numerical methods.
|
||||
|
||||
However, the entire Slope One method can be implemented in a very fast
|
||||
and simple way with a couple matrix operations.
|
||||
|
||||
First, we need to have our data encoded as a *utility matrix*. In a
|
||||
utility matrix, each row represents one user, each column represents
|
||||
one item (a movie, in our case), and each element represents a user's
|
||||
rating of an item. If we have $n$ users and $m$ movies, then this a
|
||||
$n \times m$ matrix $U$ for which $U_{k,i}$ is user $k$'s rating for
|
||||
movie $i$ - assuming we've numbered our users and our movies.
|
||||
|
||||
Users have typically rated only a fraction of movies, and so most of
|
||||
the elements of this matrix are unknown. We can represent this with
|
||||
another $n \times m$ matrix (specifically a binary matrix), a 'mask'
|
||||
$M$ in which $M_{k,i}$ is 1 if user $k$ supplied a rating for movie
|
||||
$i$, and otherwise 0.
|
||||
|
||||
I mentioned *deviation* above and gave an informal definition of it.
|
||||
The paper gaves a formal but rather terse definition below of the
|
||||
average deviation of item $i$ with respect to item $j$:
|
||||
|
||||
$$\textrm{dev}_{j,i} = \sum_{u \in S_{j,i}(\chi)} \frac{u_j - u_i}{card(S_{j,i}(\chi))}$$
|
||||
|
||||
where:
|
||||
- $u_j$ and $u_i$ mean: user $u$'s ratings for movies $i$ and $j$, respectively
|
||||
- $u \in S_{j,i}(\chi)$ means: all users $u$ who, in the dataset we're
|
||||
training on, provided a rating for both movie $i$ and movie $j$
|
||||
- $card$ is the cardinality of that set, i.e. for
|
||||
${card(S_{j,i}(\chi))}$ it is just how many users rated both $i$ and
|
||||
$j$.
|
||||
|
||||
That denominator does depend on $i$ and $j$, but doesn't depend on the
|
||||
summation term, so it can be pulled out, and also, we can split up the
|
||||
summation as long as it is kept over the same terms:
|
||||
|
||||
$$\textrm{dev}_{j,i} = \frac{1}{card(S_{j,i}(\chi))} \sum_{u \in
|
||||
S_{j,i}(\chi)} u_j - u_i = \frac{1}{card(S_{j,i}(\chi))}\left(\sum_{u
|
||||
\in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i\right)$$
|
||||
|
||||
# TODO: These need some actual matrices to illustrate
|
||||
|
||||
Let's start with computing ${card(S_{j,i}(\chi))}$, the number of
|
||||
users who rated both movie $i$ and movie $j$. Consider column $i$ of
|
||||
the mask $M$. For each value in this column, it equals 1 if the
|
||||
respective user rated movie $i$, or 0 if they did not. Clearly,
|
||||
simply summing up column $i$ would tell us how many users rated movie
|
||||
$i$, and the same applies to column $j$ for movie $j$.
|
||||
|
||||
Now, suppose we take element-wise logical AND of columns $i$ and $j$.
|
||||
The resultant column has a 1 only where both corresponding elements
|
||||
were 1 - where a user rated both $i$ and $j$. If we sum up this
|
||||
column, we have exactly the number we need: the number of users who
|
||||
rated both $i$ and $j$.
|
||||
|
||||
Some might notice that "elementwise logical AND" is just "elementwise
|
||||
multiplication", thus "sum of elementwise logical AND" is just "sum of
|
||||
elementwise multiplication", which is: dot product. That is,
|
||||
${card(S_{j,i}(\chi))}=M_j \bullet M_i$ if we use $M_i$ and $M_j$ for
|
||||
columns $i$ and $j$ of $M$.
|
||||
|
||||
However, we'd like to compute deviation as a matrix for all $i$ and
|
||||
$j$, so we'll likewise need ${card(S_{j,i}(\chi))}$ for every single
|
||||
combination of $i$ and $j$ - that is, we need a dot product between
|
||||
every single pair of columns from $M$. Incidentally, "dot product of
|
||||
every pair of columns" happens to be almost exactly matrix
|
||||
multiplication; note that for matrices $A$ and $B$, element $(x,y)$ of
|
||||
the matrix product $AB$ is just the dot product of /row/ $x$ of $A$
|
||||
and /column/ $y$ of $B$ - and that matrix product as a whole has this
|
||||
dot product between every row of $A$ and every column of $B$.
|
||||
|
||||
We wanted the dot product of every column of $M$ with every column of
|
||||
$M$, which is easy: just transpose $M$ for one operand. Then, we can
|
||||
compute our count matrix like this:
|
||||
|
||||
$$C=M^\top M$$
|
||||
|
||||
Thus $C_{i,j}$ is the dot product of column $i$ of $M$ and column $j$
|
||||
of $M$ - or, the number of users who rated both movies $i$ and $j$.
|
||||
|
||||
That was the first half of what we needed for $\textrm{dev}_{j,i}$.
|
||||
We still need the other half:
|
||||
|
||||
$$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i$$
|
||||
|
||||
We can apply a similar trick here. Consider first what $\sum_{u \in
|
||||
S_{j,i}(\chi)} u_j$ means: It is the sum of only those ratings of
|
||||
movie $j$ that were done by a user who also rated movie $i$.
|
||||
Likewise, $\sum_{u \in S_{j,i}(\chi)} u_j$ is the sum of only those
|
||||
ratings of movie $i$ that were done by a user who also rated movie
|
||||
$j$. (Note the symmetry: it's over the same set of users, because
|
||||
it's always the users who rated both $i$ and $j$.)
|
||||
|
||||
# TODO: Finish that section (mostly translate from code notes)
|
||||
|
||||
* Implementation
|
||||
|
||||
#+BEGIN_SRC python
|
||||
print("foo")
|
||||
#+END_SRC
|
||||
@@ -1,30 +0,0 @@
|
||||
#+TITLE: Untitled rant on machine learning hype
|
||||
#+AUTHOR: Chris Hodapp
|
||||
#+DATE: February 24, 2018
|
||||
#+TAGS: technobabble
|
||||
|
||||
The present state in machine learning feels like an arms-race for
|
||||
techniques that perfomr better, faster, more efficient, or whatever on
|
||||
a handful of problems, and not much in terms of killer applications
|
||||
that actually need this.
|
||||
|
||||
We've all been hearing for a few years about the demand here but
|
||||
mostly there seems a dearth of companies that actually have any sort
|
||||
of sustained vision for actual uses of machine learning. Plenty exist
|
||||
that have grand promises, and plenty of large companies keep trying to
|
||||
acquire all talent to further that arms race, but that's about it.
|
||||
|
||||
Certainly this will change as machine learning "gets better" but in
|
||||
order for a lot of improvement to occur there must be at the same time
|
||||
some actual compelling ideas and applications to drive it.
|
||||
|
||||
In that sense I don't believe that crrent advancements will be that
|
||||
fruitful on their own. We don't need optimizations, we need
|
||||
applications.
|
||||
|
||||
Of course this is not the first time a entire industry was imagined
|
||||
and hyped based on neat technology and little else...
|
||||
|
||||
Right now I feel as though the work is going to those who can actually
|
||||
articulate the "why" in specific terms, not those with some good
|
||||
knowledge primarily on the "how".
|
||||
Reference in New Issue
Block a user