Begin slow migration to Hugo...

2020-01-31 17:46:38 -05:00
parent f12c4b3773
commit 921b0e88fc
180 changed files with 2084 additions and 285 deletions
--- a/drafts/2017-12-12-dataflow.org
+++ b/drafts/2017-12-12-dataflow.org
@@ -1,42 +0,0 @@
-#+TITLE: Dataflow paradigm (working title)
-#+AUTHOR: Chris Hodapp
-#+DATE: December 12, 2017
-#+TAGS: technobabble
-
-I don't know if there's actually anything to write here.
-
-There is a sort of parallel between the declarative nature of
-computational graphs in TensorFlow, and functional programming
-(possibly function-level - think of the J language and how important
-rank is to its computations).
-
-Apache Spark and TensorFlow are very similar in a lot of ways.  The
-key difference I see is that Spark handles different types of data
-internally that are more suited to databases, reords, tables, and
-generally relational data, while TensorFlow is, well, tensors
-(arbitrary-dimensional arrays).
-
-The interesting part to me with both of these is how they've moved
-"bulk" computations into first-class objects (ish) and permitted some
-level of introspection into them before they run, as they run, and
-after they run.  Like I noted in Notes - Paper, 2016-11-13, "One
-interesting (to me) facet is how the computation process has been
-split out and instrumented enough to allow some meaningful
-introspection with it.  It hasn't precisely made it a first-class
-construct, but still, this feature pervades all of Spark's major
-abstractions (RDD, DataFrame, Dataset)."
-
-# Show Tensorboard example here
-# Screenshots may be a good idea too
-
-Spark does this with a database. TensorFlow does it with numerical
-calculations.  Node-RED does it with irregular, asynchronous data.
-
- [[https://mxnet.incubator.apache.org/how_to/visualize_graph.html][mxnet: How to visualize Neural Networks as computation graph]]
- [[https://medium.com/intuitionmachine/pytorch-dynamic-computational-graphs-and-modular-deep-learning-7e7f89f18d1][PyTorch, Dynamic Computational Graphs and Modular Deep Learning]]
- [[https://github.com/WarBean/hyperboard][HyperBoard: A web-based dashboard for Deep Learning]]
- [[https://www.postgresql.org/docs/current/static/sql-explain.html][EXPLAIN in PostgreSQL]]
-  - http://tatiyants.com/postgres-query-plan-visualization/
- https://en.wikipedia.org/wiki/Dataflow_programming
- Pure Data!
- [[https://en.wikipedia.org/wiki/Orange_(software)][Orange]]?
--- a/drafts/2018-01-30-slope-one.org
+++ b/drafts/2018-01-30-slope-one.org
@@ -1,193 +0,0 @@
---
-title: Collaborative Filtering with Slope One Predictors
-author: Chris Hodapp
-date: January 30, 2018
-tags: technobabble, machine learning
---
-
-# Needs a brief intro
-
-# Needs a summary at the end
-
-Suppose you have a large number of users, and a large number of
-movies.  Users have watched movies, and they've provided ratings for
-some of them (perhaps just simple numerical ratings, 1 to 10 stars).
-However, they've all watched different movies, and for any given user,
-it's only a tiny fraction of the total movies.
-
-Now, you want to predict how some user will rate some movie they
-haven't rated, based on what they (and other users) have rated.
-
-That's a common problem, especially when generalized from 'movies' to
-anything else, and one with many approaches.  (To put some technical
-terms to it, this is the [[https://en.wikipedia.org/wiki/Collaborative_filtering][collaborative filtering]] approach to
-[[https://en.wikipedia.org/wiki/Recommender_system][recommender systems]].  [[http://www.mmds.org/][Mining of Massive Datasets]] is an excellent free
-text in which to read more in depth on this, particularly chapter 9.)
-
-Slope One Predictors are one such approach to collaborative filtering,
-described in the paper [[https://arxiv.org/pdf/cs/0702144v1.pdf][Slope One Predictors for Online Rating-Based
-Collaborative Filtering]].  Despite the complex-sounding name, they are
-wonderfully simple to understand and implement, and very fast.
-
-I'll give a contrived example below to explain them.
-
-Consider a user Bob.  Bob is enthusiastic, but has rather simple
-tastes: he mostly just watches Clint Eastwood movies.  In fact, he's
-watched and rated nearly all of them, and basically nothing else.
-
-Now, suppose we want to predict how much Bob will like something
-completely different and unheard of (to him at least), like... I don't
-know... /Citizen Kane/.
-
-Here's Slope One in a nutshell:
-
-1. First, find the users who rated both /Citizen Kane/ *and* any of
-   the Clint Eastwood movies that Bob rated.
-2. Now, for each movie that comes up above, compute a *deviation*
-   which tells us: On average, how differently (i.e. how much higher
-   or lower) did users rate Citizen Kane compared to this movie?  (For
-   instance, we'll have a number for how /Citizen Kane/ was rated
-   compared to /Dirty Harry/, and perhaps it's +0.6 - meaning that on
-   average, users who rated both movies rated /Citizen Kane/ about 0.6
-   stars above /Dirty Harry/.  We'd have another deviation for
-   /Citizen Kane/ compared to /Gran Torino/, another for /Citizen
-   Kane/ compared to /The Good, the Bad and the Ugly/, and so on - for
-   every movie that Bob rated, provided that other users who rated
-   /Citizen Kane/ also rated the movie.)
-3. If that deviation between /Citizen Kane/ and /Dirty Harry/ was
-   +0.6, it's reasonable that adding 0.6 from Bob's rating on /Dirty
-   Harry/ would give one prediction of how Bob might rate /Citizen
-   Kane/.  We can then generate more predictions based on the ratings
-   he gave the other movies - anything for which we could compute a
-   deviation.
-4. To turn this to a single prediction, we could just average all
-   those predictions together.
-
-One variant, Weighted Slope One, is nearly identical.  The only
-difference is in how we average those predictions in step #4.  In
-Slope One, every deviation counts equally, no matter how many users
-had differences in ratings averaged together to produce it.  In
-Weighted Slope One, deviations that came from larger numbers of users
-count for more (because, presumably, they are better estimates).
-
-Or, in other words: If only one person rated both /Citizen Kane/ and
-the lesser-known Eastwood classic /Revenge of the Creature/, and they
-happened to think that /Revenge of the Creature/ deserved at least 3
-more stars, then with Slope One, this deviation of -3 would carry
-exactly as much weight as thousands of people rating /Citizen Kane/ as
-about 0.5 stars below /The Good, the Bad and the Ugly/.  In Weighted
-Slope One, that latter deviation would count for thousands of times as
-much.  The example makes it sound a bit more drastic than it is.
-
-The Python library [[http://surpriselib.com/][Surprise]] (a [[https://www.scipy.org/scikits.html][scikit]]) has an implementation of this
-algorithm, and the Benchmarks section of that page shows its
-performance compared to some other methods.
-
-/TODO/: Show a simple Python implementation of this (Jupyter
-notebook?)
-
-* Linear Algebra Tricks
-
-Those who aren't familiar with matrix methods or algebra can probably
-skip this section. Everything I've described above, you can compute
-given just some data to work with ([[https://grouplens.org/datasets/movielens/100k/][movielens 100k]], perhaps?) and some
-basic arithmetic.  You don't need any complicated numerical methods.
-
-However, the entire Slope One method can be implemented in a very fast
-and simple way with a couple matrix operations.
-
-First, we need to have our data encoded as a *utility matrix*.  In a
-utility matrix, each row represents one user, each column represents
-one item (a movie, in our case), and each element represents a user's
-rating of an item.  If we have $n$ users and $m$ movies, then this a
-$n \times m$ matrix $U$ for which $U_{k,i}$ is user $k$'s rating for
-movie $i$ - assuming we've numbered our users and our movies.
-
-Users have typically rated only a fraction of movies, and so most of
-the elements of this matrix are unknown.  We can represent this with
-another $n \times m$ matrix (specifically a binary matrix), a 'mask'
-$M$ in which $M_{k,i}$ is 1 if user $k$ supplied a rating for movie
-$i$, and otherwise 0.
-
-I mentioned *deviation* above and gave an informal definition of it.
-The paper gaves a formal but rather terse definition below of the
-average deviation of item $i$ with respect to item $j$:
-
-$$\textrm{dev}_{j,i} = \sum_{u \in S_{j,i}(\chi)} \frac{u_j - u_i}{card(S_{j,i}(\chi))}$$
-
-where:
- $u_j$ and $u_i$ mean: user $u$'s ratings for movies $i$ and $j$, respectively
- $u \in S_{j,i}(\chi)$ means: all users $u$ who, in the dataset we're
-  training on, provided a rating for both movie $i$ and movie $j$
- $card$ is the cardinality of that set, i.e. for
-  ${card(S_{j,i}(\chi))}$ it is just how many users rated both $i$ and
-  $j$.
-
-That denominator does depend on $i$ and $j$, but doesn't depend on the
-summation term, so it can be pulled out, and also, we can split up the
-summation as long as it is kept over the same terms:
-
-$$\textrm{dev}_{j,i} = \frac{1}{card(S_{j,i}(\chi))} \sum_{u \in
-S_{j,i}(\chi)} u_j - u_i = \frac{1}{card(S_{j,i}(\chi))}\left(\sum_{u
-\in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i\right)$$
-
-# TODO: These need some actual matrices to illustrate
-
-Let's start with computing ${card(S_{j,i}(\chi))}$, the number of
-users who rated both movie $i$ and movie $j$.  Consider column $i$ of
-the mask $M$.  For each value in this column, it equals 1 if the
-respective user rated movie $i$, or 0 if they did not.  Clearly,
-simply summing up column $i$ would tell us how many users rated movie
-$i$, and the same applies to column $j$ for movie $j$.
-
-Now, suppose we take element-wise logical AND of columns $i$ and $j$.
-The resultant column has a 1 only where both corresponding elements
-were 1 - where a user rated both $i$ and $j$.  If we sum up this
-column, we have exactly the number we need: the number of users who
-rated both $i$ and $j$.
-
-Some might notice that "elementwise logical AND" is just "elementwise
-multiplication", thus "sum of elementwise logical AND" is just "sum of
-elementwise multiplication", which is: dot product.  That is,
-${card(S_{j,i}(\chi))}=M_j \bullet M_i$ if we use $M_i$ and $M_j$ for
-columns $i$ and $j$ of $M$.
-
-However, we'd like to compute deviation as a matrix for all $i$ and
-$j$, so we'll likewise need ${card(S_{j,i}(\chi))}$ for every single
-combination of $i$ and $j$ - that is, we need a dot product between
-every single pair of columns from $M$.  Incidentally, "dot product of
-every pair of columns" happens to be almost exactly matrix
-multiplication; note that for matrices $A$ and $B$, element $(x,y)$ of
-the matrix product $AB$ is just the dot product of /row/ $x$ of $A$
-and /column/ $y$ of $B$ - and that matrix product as a whole has this
-dot product between every row of $A$ and every column of $B$.
-
-We wanted the dot product of every column of $M$ with every column of
-$M$, which is easy: just transpose $M$ for one operand.  Then, we can
-compute our count matrix like this:
-
-$$C=M^\top M$$
-
-Thus $C_{i,j}$ is the dot product of column $i$ of $M$ and column $j$
-of $M$ - or, the number of users who rated both movies $i$ and $j$.
-
-That was the first half of what we needed for $\textrm{dev}_{j,i}$.
-We still need the other half:
-
-$$\sum_{u \in S_{j,i}(\chi)} u_j - \sum_{u \in S_{j,i}(\chi)} u_i$$
-
-We can apply a similar trick here.  Consider first what $\sum_{u \in
-S_{j,i}(\chi)} u_j$ means: It is the sum of only those ratings of
-movie $j$ that were done by a user who also rated movie $i$.
-Likewise, $\sum_{u \in S_{j,i}(\chi)} u_j$ is the sum of only those
-ratings of movie $i$ that were done by a user who also rated movie
-$j$.  (Note the symmetry: it's over the same set of users, because
-it's always the users who rated both $i$ and $j$.)
-
-# TODO: Finish that section (mostly translate from code notes)
-
-* Implementation
-
-#+BEGIN_SRC python
-print("foo")
-#+END_SRC
--- a/drafts/2018-02-24-ml-rant.org
+++ b/drafts/2018-02-24-ml-rant.org
@@ -1,30 +0,0 @@
-#+TITLE: Untitled rant on machine learning hype
-#+AUTHOR: Chris Hodapp
-#+DATE: February 24, 2018
-#+TAGS: technobabble
-
-The present state in machine learning feels like an arms-race for
-techniques that perfomr better, faster, more efficient, or whatever on
-a handful of problems, and not much in terms of killer applications
-that actually need this.
-
-We've all been hearing for a few years about the demand here but
-mostly there seems a dearth of companies that actually have any sort
-of sustained vision for actual uses of machine learning.  Plenty exist
-that have grand promises, and plenty of large companies keep trying to
-acquire all talent to further that arms race, but that's about it.
-
-Certainly this will change as machine learning "gets better" but in
-order for a lot of improvement to occur there must be at the same time
-some actual compelling ideas and applications to drive it.
-
-In that sense I don't believe that crrent advancements will be that
-fruitful on their own.  We don't need optimizations, we need
-applications.
-
-Of course this is not the first time a entire industry was imagined
-and hyped based on neat technology and little else...
-
-Right now I feel as though the work is going to those who can actually
-articulate the "why" in specific terms, not those with some good
-knowledge primarily on the "how".