diff --git a/drafts/2017-12-12-dataflow.org b/drafts/2017-12-12-dataflow.org new file mode 100644 index 0000000..0f6418f --- /dev/null +++ b/drafts/2017-12-12-dataflow.org @@ -0,0 +1,31 @@ +#+TITLE: Dataflow paradigm (working title) +#+AUTHOR: Chris Hodapp +#+DATE: December 12, 2017 +#+TAGS: technobabble + +There is a sort of parallel between the declarative nature of +computational graphs in TensorFlow, and functional programming +(possibly function-level - think of the J language and how important +rank is to its computations). + +Apache Spark and TensorFlow are very similar in a lot of ways. The +key difference I see is that Spark handles different types of data +internally that are more suited to databases, reords, tables, and +generally relational data, while TensorFlow is, well, tensors +(arbitrary-dimensional arrays). + +The interesting part to me with both of these is how they've moved +"bulk" computations into first-class objects (ish) and permitted some +level of introspection into them before they run, as they run, and +after they run. Like I noted in Notes - Paper, 2016-11-13, "One +interesting (to me) facet is how the computation process has been +split out and instrumented enough to allow some meaningful +introspection with it. It hasn't precisely made it a first-class +construct, but still, this feature pervades all of Spark's major +abstractions (RDD, DataFrame, Dataset)." + +# Show Tensorboard example here +# Screenshots may be a good idea too + +Spark does this with a database. TensorFlow does it with numerical +calculations. Node-RED does it with irregular, asynchronous data. diff --git a/drafts/2017-12-13-retinanet.org b/drafts/2017-12-13-retinanet.org new file mode 100644 index 0000000..9a9c7ea --- /dev/null +++ b/drafts/2017-12-13-retinanet.org @@ -0,0 +1,72 @@ +#+TITLE: Explaining RetinaNet +#+AUTHOR: Chris Hodapp +#+DATE: December 13, 2017 +#+TAGS: technobabble + +A paper came out in the past few months, [[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object +Detection]], from one of Facebook's teams. The goal of this post is to +explain this work a bit as I work through the paper, and to look at +one particular [[https://github.com/fizyr/keras-retinanet][implementation in Keras]]. + +"Object detection" as it is used here refers to machine learning +models that can not just identify a single object in an image, but can +identify and *localize* multiple objects, like in the below photo +taken from [[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow +Object Detection API]]: + +# TODO: +# Define mAP + +#+CAPTION: TensorFlow object detection example 2. +#+ATTR_HTML: :width 100% :height 100% +[[../images/2017-12-13-objdet.jpg]] + +The paper discusses many of the two-stage approaches, like R-CNN and +its variants, which work in two steps: + +1. One model proposes a sparse set of locations in the image that + probably contain something. Ideally, this contains all objects in + the image, but filters out the majority of negative locations + (i.e. only background, not foreground). +2. Another model, typically a convolutional neural network, classifies + each location in that sparse set as either being foreground and + some specific object class, or as being background. + +Additionally, it discusses some existing one-stage approaches like +[[https://pjreddie.com/darknet/yolo/][YOLO]] and [[https://arxiv.org/abs/1512.02325][SSD]]. In essence, these run only the second step - but +instead of starting from a sparse set of locations that are probably +something of interest, they start from a dense set of locations which +has blanketed the entire image on a grid of many locations, over many +sizes, and over many aspect ratios, regardless of whether they may +contain an object. + +This is simpler and faster - but not nearly as accurate. + +Broadly, the process of training these models requires minimizing some +kind of loss function that is based on what the model misclassifies +when it is run on some training data. It's preferable to be able to +compute some loss over each individual instance, and add all of these +losses up to produce an overall loss. + +This leads to a problem in one-stage detectors: That dense set of +locations that it's classifying usually contains a small number of +locations that actually have objects (positives), and a much larger +number of locations that are just background and can be very easily +classified as being in the background (easy negatives). However, the +loss function still adds all of them up - and even if the loss is +relatively low for each of the easy negatives, their cumulative loss +can drown out the loss from objects that are being misclassified. + +The training process is trying to minimize this loss, and so it is +mostly nudging the model to improve in the area least in need of it +(its ability to classify background areas that it already classifies +well) and neglecting the area most in need of it (its ability to +classify the "difficult" objects that it is misclassifying). + +# TODO: What else can I say about why loss should be additive? +# Quote DL text? ML text? + +This is the *class imbalance* issue in a nutshell that the paper gives +as the limiting factor for the accuracy of one-stage detectors. + +# TODO: Visualize this. Can I? diff --git a/images/2017-12-13-objdet.jpg b/images/2017-12-13-objdet.jpg new file mode 100644 index 0000000..1a72863 Binary files /dev/null and b/images/2017-12-13-objdet.jpg differ