Add draft post on RetinaNet, and stub for dataflow stuff

2017-12-13 21:17:56 -05:00
parent 92c4efac7d
commit e588dce485
3 changed files with 103 additions and 0 deletions
--- a/drafts/2017-12-12-dataflow.org
+++ b/drafts/2017-12-12-dataflow.org
@@ -0,0 +1,31 @@
+#+TITLE: Dataflow paradigm (working title)
+#+AUTHOR: Chris Hodapp
+#+DATE: December 12, 2017
+#+TAGS: technobabble
+
+There is a sort of parallel between the declarative nature of
+computational graphs in TensorFlow, and functional programming
+(possibly function-level - think of the J language and how important
+rank is to its computations).
+
+Apache Spark and TensorFlow are very similar in a lot of ways.  The
+key difference I see is that Spark handles different types of data
+internally that are more suited to databases, reords, tables, and
+generally relational data, while TensorFlow is, well, tensors
+(arbitrary-dimensional arrays).
+
+The interesting part to me with both of these is how they've moved
+"bulk" computations into first-class objects (ish) and permitted some
+level of introspection into them before they run, as they run, and
+after they run.  Like I noted in Notes - Paper, 2016-11-13, "One
+interesting (to me) facet is how the computation process has been
+split out and instrumented enough to allow some meaningful
+introspection with it.  It hasn't precisely made it a first-class
+construct, but still, this feature pervades all of Spark's major
+abstractions (RDD, DataFrame, Dataset)."
+
+# Show Tensorboard example here
+# Screenshots may be a good idea too
+
+Spark does this with a database. TensorFlow does it with numerical
+calculations.  Node-RED does it with irregular, asynchronous data.
--- a/drafts/2017-12-13-retinanet.org
+++ b/drafts/2017-12-13-retinanet.org
@@ -0,0 +1,72 @@
+#+TITLE: Explaining RetinaNet
+#+AUTHOR: Chris Hodapp
+#+DATE: December 13, 2017
+#+TAGS: technobabble
+
+A paper came out in the past few months, [[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object
+Detection]], from one of Facebook's teams.  The goal of this post is to
+explain this work a bit as I work through the paper, and to look at
+one particular [[https://github.com/fizyr/keras-retinanet][implementation in Keras]].
+
+"Object detection" as it is used here refers to machine learning
+models that can not just identify a single object in an image, but can
+identify and *localize* multiple objects, like in the below photo
+taken from [[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow
+Object Detection API]]:
+
+# TODO:
+# Define mAP
+
+#+CAPTION: TensorFlow object detection example 2.
+#+ATTR_HTML: :width 100% :height 100%
+[[../images/2017-12-13-objdet.jpg]]
+
+The paper discusses many of the two-stage approaches, like R-CNN and
+its variants, which work in two steps:
+
+1. One model proposes a sparse set of locations in the image that
+   probably contain something.  Ideally, this contains all objects in
+   the image, but filters out the majority of negative locations
+   (i.e. only background, not foreground).
+2. Another model, typically a convolutional neural network, classifies
+   each location in that sparse set as either being foreground and
+   some specific object class, or as being background.
+
+Additionally, it discusses some existing one-stage approaches like
+[[https://pjreddie.com/darknet/yolo/][YOLO]] and [[https://arxiv.org/abs/1512.02325][SSD]].  In essence, these run only the second step - but
+instead of starting from a sparse set of locations that are probably
+something of interest, they start from a dense set of locations which
+has blanketed the entire image on a grid of many locations, over many
+sizes, and over many aspect ratios, regardless of whether they may
+contain an object.
+
+This is simpler and faster - but not nearly as accurate.
+
+Broadly, the process of training these models requires minimizing some
+kind of loss function that is based on what the model misclassifies
+when it is run on some training data.  It's preferable to be able to
+compute some loss over each individual instance, and add all of these
+losses up to produce an overall loss.
+
+This leads to a problem in one-stage detectors: That dense set of
+locations that it's classifying usually contains a small number of
+locations that actually have objects (positives), and a much larger
+number of locations that are just background and can be very easily
+classified as being in the background (easy negatives). However, the
+loss function still adds all of them up - and even if the loss is
+relatively low for each of the easy negatives, their cumulative loss
+can drown out the loss from objects that are being misclassified.
+
+The training process is trying to minimize this loss, and so it is
+mostly nudging the model to improve in the area least in need of it
+(its ability to classify background areas that it already classifies
+well) and neglecting the area most in need of it (its ability to
+classify the "difficult" objects that it is misclassifying).
+
+# TODO: What else can I say about why loss should be additive?
+# Quote DL text? ML text?
+
+This is the *class imbalance* issue in a nutshell that the paper gives
+as the limiting factor for the accuracy of one-stage detectors.
+
+# TODO: Visualize this. Can I?
--- a/images/2017-12-13-objdet.jpg
+++ b/images/2017-12-13-objdet.jpg