Add draft post on RetinaNet, and stub for dataflow stuff

This commit is contained in:
Chris Hodapp 2017-12-13 21:17:56 -05:00
parent 92c4efac7d
commit e588dce485
3 changed files with 103 additions and 0 deletions

View File

@ -0,0 +1,31 @@
#+TITLE: Dataflow paradigm (working title)
#+AUTHOR: Chris Hodapp
#+DATE: December 12, 2017
#+TAGS: technobabble
There is a sort of parallel between the declarative nature of
computational graphs in TensorFlow, and functional programming
(possibly function-level - think of the J language and how important
rank is to its computations).
Apache Spark and TensorFlow are very similar in a lot of ways. The
key difference I see is that Spark handles different types of data
internally that are more suited to databases, reords, tables, and
generally relational data, while TensorFlow is, well, tensors
(arbitrary-dimensional arrays).
The interesting part to me with both of these is how they've moved
"bulk" computations into first-class objects (ish) and permitted some
level of introspection into them before they run, as they run, and
after they run. Like I noted in Notes - Paper, 2016-11-13, "One
interesting (to me) facet is how the computation process has been
split out and instrumented enough to allow some meaningful
introspection with it. It hasn't precisely made it a first-class
construct, but still, this feature pervades all of Spark's major
abstractions (RDD, DataFrame, Dataset)."
# Show Tensorboard example here
# Screenshots may be a good idea too
Spark does this with a database. TensorFlow does it with numerical
calculations. Node-RED does it with irregular, asynchronous data.

View File

@ -0,0 +1,72 @@
#+TITLE: Explaining RetinaNet
#+AUTHOR: Chris Hodapp
#+DATE: December 13, 2017
#+TAGS: technobabble
A paper came out in the past few months, [[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object
Detection]], from one of Facebook's teams. The goal of this post is to
explain this work a bit as I work through the paper, and to look at
one particular [[https://github.com/fizyr/keras-retinanet][implementation in Keras]].
"Object detection" as it is used here refers to machine learning
models that can not just identify a single object in an image, but can
identify and *localize* multiple objects, like in the below photo
taken from [[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow
Object Detection API]]:
# TODO:
# Define mAP
#+CAPTION: TensorFlow object detection example 2.
#+ATTR_HTML: :width 100% :height 100%
[[../images/2017-12-13-objdet.jpg]]
The paper discusses many of the two-stage approaches, like R-CNN and
its variants, which work in two steps:
1. One model proposes a sparse set of locations in the image that
probably contain something. Ideally, this contains all objects in
the image, but filters out the majority of negative locations
(i.e. only background, not foreground).
2. Another model, typically a convolutional neural network, classifies
each location in that sparse set as either being foreground and
some specific object class, or as being background.
Additionally, it discusses some existing one-stage approaches like
[[https://pjreddie.com/darknet/yolo/][YOLO]] and [[https://arxiv.org/abs/1512.02325][SSD]]. In essence, these run only the second step - but
instead of starting from a sparse set of locations that are probably
something of interest, they start from a dense set of locations which
has blanketed the entire image on a grid of many locations, over many
sizes, and over many aspect ratios, regardless of whether they may
contain an object.
This is simpler and faster - but not nearly as accurate.
Broadly, the process of training these models requires minimizing some
kind of loss function that is based on what the model misclassifies
when it is run on some training data. It's preferable to be able to
compute some loss over each individual instance, and add all of these
losses up to produce an overall loss.
This leads to a problem in one-stage detectors: That dense set of
locations that it's classifying usually contains a small number of
locations that actually have objects (positives), and a much larger
number of locations that are just background and can be very easily
classified as being in the background (easy negatives). However, the
loss function still adds all of them up - and even if the loss is
relatively low for each of the easy negatives, their cumulative loss
can drown out the loss from objects that are being misclassified.
The training process is trying to minimize this loss, and so it is
mostly nudging the model to improve in the area least in need of it
(its ability to classify background areas that it already classifies
well) and neglecting the area most in need of it (its ability to
classify the "difficult" objects that it is misclassifying).
# TODO: What else can I say about why loss should be additive?
# Quote DL text? ML text?
This is the *class imbalance* issue in a nutshell that the paper gives
as the limiting factor for the accuracy of one-stage detectors.
# TODO: Visualize this. Can I?

Binary file not shown.

After

Width:  |  Height:  |  Size: 256 KiB