Add draft post on RetinaNet, and stub for dataflow stuff
This commit is contained in:
parent
92c4efac7d
commit
e588dce485
31
drafts/2017-12-12-dataflow.org
Normal file
31
drafts/2017-12-12-dataflow.org
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
#+TITLE: Dataflow paradigm (working title)
|
||||||
|
#+AUTHOR: Chris Hodapp
|
||||||
|
#+DATE: December 12, 2017
|
||||||
|
#+TAGS: technobabble
|
||||||
|
|
||||||
|
There is a sort of parallel between the declarative nature of
|
||||||
|
computational graphs in TensorFlow, and functional programming
|
||||||
|
(possibly function-level - think of the J language and how important
|
||||||
|
rank is to its computations).
|
||||||
|
|
||||||
|
Apache Spark and TensorFlow are very similar in a lot of ways. The
|
||||||
|
key difference I see is that Spark handles different types of data
|
||||||
|
internally that are more suited to databases, reords, tables, and
|
||||||
|
generally relational data, while TensorFlow is, well, tensors
|
||||||
|
(arbitrary-dimensional arrays).
|
||||||
|
|
||||||
|
The interesting part to me with both of these is how they've moved
|
||||||
|
"bulk" computations into first-class objects (ish) and permitted some
|
||||||
|
level of introspection into them before they run, as they run, and
|
||||||
|
after they run. Like I noted in Notes - Paper, 2016-11-13, "One
|
||||||
|
interesting (to me) facet is how the computation process has been
|
||||||
|
split out and instrumented enough to allow some meaningful
|
||||||
|
introspection with it. It hasn't precisely made it a first-class
|
||||||
|
construct, but still, this feature pervades all of Spark's major
|
||||||
|
abstractions (RDD, DataFrame, Dataset)."
|
||||||
|
|
||||||
|
# Show Tensorboard example here
|
||||||
|
# Screenshots may be a good idea too
|
||||||
|
|
||||||
|
Spark does this with a database. TensorFlow does it with numerical
|
||||||
|
calculations. Node-RED does it with irregular, asynchronous data.
|
||||||
72
drafts/2017-12-13-retinanet.org
Normal file
72
drafts/2017-12-13-retinanet.org
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
#+TITLE: Explaining RetinaNet
|
||||||
|
#+AUTHOR: Chris Hodapp
|
||||||
|
#+DATE: December 13, 2017
|
||||||
|
#+TAGS: technobabble
|
||||||
|
|
||||||
|
A paper came out in the past few months, [[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object
|
||||||
|
Detection]], from one of Facebook's teams. The goal of this post is to
|
||||||
|
explain this work a bit as I work through the paper, and to look at
|
||||||
|
one particular [[https://github.com/fizyr/keras-retinanet][implementation in Keras]].
|
||||||
|
|
||||||
|
"Object detection" as it is used here refers to machine learning
|
||||||
|
models that can not just identify a single object in an image, but can
|
||||||
|
identify and *localize* multiple objects, like in the below photo
|
||||||
|
taken from [[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow
|
||||||
|
Object Detection API]]:
|
||||||
|
|
||||||
|
# TODO:
|
||||||
|
# Define mAP
|
||||||
|
|
||||||
|
#+CAPTION: TensorFlow object detection example 2.
|
||||||
|
#+ATTR_HTML: :width 100% :height 100%
|
||||||
|
[[../images/2017-12-13-objdet.jpg]]
|
||||||
|
|
||||||
|
The paper discusses many of the two-stage approaches, like R-CNN and
|
||||||
|
its variants, which work in two steps:
|
||||||
|
|
||||||
|
1. One model proposes a sparse set of locations in the image that
|
||||||
|
probably contain something. Ideally, this contains all objects in
|
||||||
|
the image, but filters out the majority of negative locations
|
||||||
|
(i.e. only background, not foreground).
|
||||||
|
2. Another model, typically a convolutional neural network, classifies
|
||||||
|
each location in that sparse set as either being foreground and
|
||||||
|
some specific object class, or as being background.
|
||||||
|
|
||||||
|
Additionally, it discusses some existing one-stage approaches like
|
||||||
|
[[https://pjreddie.com/darknet/yolo/][YOLO]] and [[https://arxiv.org/abs/1512.02325][SSD]]. In essence, these run only the second step - but
|
||||||
|
instead of starting from a sparse set of locations that are probably
|
||||||
|
something of interest, they start from a dense set of locations which
|
||||||
|
has blanketed the entire image on a grid of many locations, over many
|
||||||
|
sizes, and over many aspect ratios, regardless of whether they may
|
||||||
|
contain an object.
|
||||||
|
|
||||||
|
This is simpler and faster - but not nearly as accurate.
|
||||||
|
|
||||||
|
Broadly, the process of training these models requires minimizing some
|
||||||
|
kind of loss function that is based on what the model misclassifies
|
||||||
|
when it is run on some training data. It's preferable to be able to
|
||||||
|
compute some loss over each individual instance, and add all of these
|
||||||
|
losses up to produce an overall loss.
|
||||||
|
|
||||||
|
This leads to a problem in one-stage detectors: That dense set of
|
||||||
|
locations that it's classifying usually contains a small number of
|
||||||
|
locations that actually have objects (positives), and a much larger
|
||||||
|
number of locations that are just background and can be very easily
|
||||||
|
classified as being in the background (easy negatives). However, the
|
||||||
|
loss function still adds all of them up - and even if the loss is
|
||||||
|
relatively low for each of the easy negatives, their cumulative loss
|
||||||
|
can drown out the loss from objects that are being misclassified.
|
||||||
|
|
||||||
|
The training process is trying to minimize this loss, and so it is
|
||||||
|
mostly nudging the model to improve in the area least in need of it
|
||||||
|
(its ability to classify background areas that it already classifies
|
||||||
|
well) and neglecting the area most in need of it (its ability to
|
||||||
|
classify the "difficult" objects that it is misclassifying).
|
||||||
|
|
||||||
|
# TODO: What else can I say about why loss should be additive?
|
||||||
|
# Quote DL text? ML text?
|
||||||
|
|
||||||
|
This is the *class imbalance* issue in a nutshell that the paper gives
|
||||||
|
as the limiting factor for the accuracy of one-stage detectors.
|
||||||
|
|
||||||
|
# TODO: Visualize this. Can I?
|
||||||
BIN
images/2017-12-13-objdet.jpg
Normal file
BIN
images/2017-12-13-objdet.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 256 KiB |
Loading…
x
Reference in New Issue
Block a user