Add draft post on RetinaNet, and stub for dataflow stuff

2017-12-13 21:17:56 -05:00
parent 92c4efac7d
commit e588dce485
3 changed files with 103 additions and 0 deletions
--- a/drafts/2017-12-12-dataflow.org
+++ b/drafts/2017-12-12-dataflow.org
@@ -0,0 +1,31 @@
 #+TITLE: Dataflow paradigm (working title)
 #+AUTHOR: Chris Hodapp
 #+DATE: December 12, 2017
 #+TAGS: technobabble
 There is a sort of parallel between the declarative nature of
 computational graphs in TensorFlow, and functional programming
 (possibly function-level - think of the J language and how important
 rank is to its computations).
 Apache Spark and TensorFlow are very similar in a lot of ways.  The
 key difference I see is that Spark handles different types of data
 internally that are more suited to databases, reords, tables, and
 generally relational data, while TensorFlow is, well, tensors
 (arbitrary-dimensional arrays).
 The interesting part to me with both of these is how they've moved
 "bulk" computations into first-class objects (ish) and permitted some
 level of introspection into them before they run, as they run, and
 after they run.  Like I noted in Notes - Paper, 2016-11-13, "One
 interesting (to me) facet is how the computation process has been
 split out and instrumented enough to allow some meaningful
 introspection with it.  It hasn't precisely made it a first-class
 construct, but still, this feature pervades all of Spark's major
 abstractions (RDD, DataFrame, Dataset)."
 # Show Tensorboard example here
 # Screenshots may be a good idea too
 Spark does this with a database. TensorFlow does it with numerical
 calculations.  Node-RED does it with irregular, asynchronous data.
--- a/drafts/2017-12-13-retinanet.org
+++ b/drafts/2017-12-13-retinanet.org
@@ -0,0 +1,72 @@
 #+TITLE: Explaining RetinaNet
 #+AUTHOR: Chris Hodapp
 #+DATE: December 13, 2017
 #+TAGS: technobabble
 A paper came out in the past few months, [[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object
 Detection]], from one of Facebook's teams.  The goal of this post is to
 explain this work a bit as I work through the paper, and to look at
 one particular [[https://github.com/fizyr/keras-retinanet][implementation in Keras]].
 "Object detection" as it is used here refers to machine learning
 models that can not just identify a single object in an image, but can
 identify and *localize* multiple objects, like in the below photo
 taken from [[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow
 Object Detection API]]:
 # TODO:
 # Define mAP
 #+CAPTION: TensorFlow object detection example 2.
 #+ATTR_HTML: :width 100% :height 100%
 [[../images/2017-12-13-objdet.jpg]]
 The paper discusses many of the two-stage approaches, like R-CNN and
 its variants, which work in two steps:
 1. One model proposes a sparse set of locations in the image that
   probably contain something.  Ideally, this contains all objects in
   the image, but filters out the majority of negative locations
   (i.e. only background, not foreground).
 2. Another model, typically a convolutional neural network, classifies
   each location in that sparse set as either being foreground and
   some specific object class, or as being background.
 Additionally, it discusses some existing one-stage approaches like
 [[https://pjreddie.com/darknet/yolo/][YOLO]] and [[https://arxiv.org/abs/1512.02325][SSD]].  In essence, these run only the second step - but
 instead of starting from a sparse set of locations that are probably
 something of interest, they start from a dense set of locations which
 has blanketed the entire image on a grid of many locations, over many
 sizes, and over many aspect ratios, regardless of whether they may
 contain an object.
 This is simpler and faster - but not nearly as accurate.
 Broadly, the process of training these models requires minimizing some
 kind of loss function that is based on what the model misclassifies
 when it is run on some training data.  It's preferable to be able to
 compute some loss over each individual instance, and add all of these
 losses up to produce an overall loss.
 This leads to a problem in one-stage detectors: That dense set of
 locations that it's classifying usually contains a small number of
 locations that actually have objects (positives), and a much larger
 number of locations that are just background and can be very easily
 classified as being in the background (easy negatives). However, the
 loss function still adds all of them up - and even if the loss is
 relatively low for each of the easy negatives, their cumulative loss
 can drown out the loss from objects that are being misclassified.
 The training process is trying to minimize this loss, and so it is
 mostly nudging the model to improve in the area least in need of it
 (its ability to classify background areas that it already classifies
 well) and neglecting the area most in need of it (its ability to
 classify the "difficult" objects that it is misclassifying).
 # TODO: What else can I say about why loss should be additive?
 # Quote DL text? ML text?
 This is the *class imbalance* issue in a nutshell that the paper gives
 as the limiting factor for the accuracy of one-stage detectors.
 # TODO: Visualize this. Can I?
--- a/images/2017-12-13-objdet.jpg
+++ b/images/2017-12-13-objdet.jpg