Migrate some drafts into content/posts with 'draft' flag
This commit is contained in:
parent
fba8a611e3
commit
129bfeb3e7
@ -2,7 +2,10 @@
|
||||
title: Retrospect on Foresight
|
||||
author: Chris Hodapp
|
||||
date: January 8, 2018
|
||||
tags: technobabble, rambling
|
||||
tags:
|
||||
- technobabble
|
||||
- rambling
|
||||
draft: true
|
||||
---
|
||||
|
||||
/(Spawned from some idle thoughts around the summer of 2015.)/
|
||||
@ -48,5 +51,6 @@ wildly impractical, or a mere facade over what is already established.
|
||||
foresight.
|
||||
- [[https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/][As We May Think (Vannevar Bush)]]
|
||||
- "Do you remember a time when..." only goes so far.
|
||||
- Buckminster Fuller
|
||||
|
||||
# Tools For Thought
|
||||
@ -1,7 +1,12 @@
|
||||
#+TITLE: Modularity & Abstraction (working title)
|
||||
#+AUTHOR: Chris Hodapp
|
||||
#+DATE: April 20, 2017
|
||||
#+TAGS: technobabble
|
||||
---
|
||||
title: Modularity & Abstraction (working title)
|
||||
author: Chris Hodapp
|
||||
date: April 20, 2017
|
||||
tags:
|
||||
- technobabble
|
||||
- rambling
|
||||
draft: true
|
||||
---
|
||||
|
||||
# Why don't I turn this into a paper for arXiv too? It can still be
|
||||
# posted to the blog (just also make it exportable to LaTeX perhaps)
|
||||
|
Before Width: | Height: | Size: 369 KiB After Width: | Height: | Size: 369 KiB |
|
Before Width: | Height: | Size: 256 KiB After Width: | Height: | Size: 256 KiB |
|
Before Width: | Height: | Size: 124 KiB After Width: | Height: | Size: 124 KiB |
@ -2,9 +2,14 @@
|
||||
title: Explaining RetinaNet
|
||||
author: Chris Hodapp
|
||||
date: December 13, 2017
|
||||
tags: technobabble
|
||||
tags:
|
||||
- technobabble
|
||||
draft: true
|
||||
---
|
||||
|
||||
# TODO: The inline equations are still broken (maybe because this is
|
||||
# in org format)
|
||||
|
||||
# Above uses style from https://github.com/turboMaCk/turboMaCk.github.io/blob/develop/posts/2016-12-21-org-mode-in-hakyll.org
|
||||
# and https://turbomack.github.io/posts/2016-12-21-org-mode-in-hakyll.html
|
||||
# description:
|
||||
@ -29,7 +34,7 @@ taken from
|
||||
|
||||
#+CAPTION: TensorFlow object detection example 2.
|
||||
#+ATTR_HTML: :width 100% :height 100%
|
||||
[[../images/2017-12-13-retinanet/2017-12-13-objdet.jpg]]
|
||||
[[./2017-12-13-objdet.jpg]]
|
||||
|
||||
At the time of writing, the most accurate object-detection methods
|
||||
were based around R-CNN and its variants, and all used two-stage
|
||||
@ -109,12 +114,12 @@ approaches while surpassing the accuracy of existing two-stage ones.
|
||||
|
||||
At least, this is what the paper claims. Their novel loss function is
|
||||
called *Focal Loss* (as the title references), and it multiplies the
|
||||
normal cross-entropy by a factor, $(1-p_t)^\gamma$, where $p_t$
|
||||
normal cross-entropy by a factor, \( (1-p_t)^\gamma \), where \( p_t \)
|
||||
approaches 1 as the model predicts a higher and higher probability of
|
||||
the correct classification, or 0 for an incorrect one, and $\gamma$ is
|
||||
a "focusing" hyperparameter (they used $\gamma=2$). Intuitively, this
|
||||
the correct classification, or 0 for an incorrect one, and \( \gamma \) is
|
||||
a "focusing" hyperparameter (they used \( \gamma=2 \)). Intuitively, this
|
||||
scaling makes sense: if a classification is already correct (as in the
|
||||
"easy negatives"), $(1-p_t)^\gamma$ tends toward 0, and so the portion
|
||||
"easy negatives"), \( (1-p_t)^\gamma \) tends toward 0, and so the portion
|
||||
of the loss multiplied by it will likewise tend toward 0.
|
||||
|
||||
|
||||
@ -152,7 +157,7 @@ diagram illustrates an image pyramid:
|
||||
|
||||
#+CAPTION: Source: https://en.wikipedia.org/wiki/File:Image_pyramid.svg
|
||||
#+ATTR_HTML: :width 100% :height 100%
|
||||
[[../images/2017-12-13-retinanet/1024px-Image_pyramid.svg.png]]
|
||||
[[./1024px-Image_pyramid.svg.png]]
|
||||
|
||||
Image pyramids have many uses, but the paper focuses on their use in
|
||||
taking something that works only at a certain scale of image - for
|
||||
@ -180,7 +185,7 @@ CNN:
|
||||
|
||||
#+CAPTION: Source: https://commons.wikimedia.org/wiki/File:Typical_cnn.png
|
||||
#+ATTR_HTML: :width 100% :height 100%
|
||||
[[../images/2017-12-13-retinanet/Typical_cnn.png]]
|
||||
[[./Typical_cnn.png]]
|
||||
|
||||
You may notice that this network has a structure that bears some
|
||||
resemblance to an image pyramid. This is because deep CNNs are
|
||||
@ -262,13 +267,13 @@ inference.
|
||||
|
||||
In particular:
|
||||
|
||||
- Say that the feature pyramid has $L$ levels, and that level $l+1$ is
|
||||
half the resolution (thus double the scale) of level $l$.
|
||||
- Say that level $l$ is a 256-channel feature map of size $W \times H$
|
||||
(i.e. it's a tensor with shape $W \times H \times 256$). Note that
|
||||
$W$ and $H$ will be larger at lower levels, and smaller at higher
|
||||
- Say that the feature pyramid has \( L \) levels, and that level \( l+1 \) is
|
||||
half the resolution (thus double the scale) of level \( l \).
|
||||
- Say that level \( l \) is a 256-channel feature map of size \( W \times H \)
|
||||
(i.e. it's a tensor with shape \( W \times H \times 256 \)). Note that
|
||||
\( W \) and \( H \) will be larger at lower levels, and smaller at higher
|
||||
levels, but in RetinaNet at least, always 256-channel samples.
|
||||
- For every point on that feature map (all $WH$ of them), we can
|
||||
- For every point on that feature map (all \( WH \) of them), we can
|
||||
identify a corresponding point in the input image. This is the
|
||||
center point of a broad region of the input image that influences
|
||||
this point in the feature map (i.e. its receptive field). Note that
|
||||
@ -279,8 +284,8 @@ In particular:
|
||||
rectangular regions associated with each point of a feature map.
|
||||
The size of the anchor depends on the scale of the feature map, or
|
||||
equivalently, what level of the feature map it came from. All this
|
||||
means is that anchors in level $l+1$ are twice as large as the
|
||||
anchors of level $l$.
|
||||
means is that anchors in level \( l+1 \) are twice as large as the
|
||||
anchors of level \( l \).
|
||||
|
||||
The view that this should paint is that a dense collection of anchors
|
||||
covers the entire input image at different sizes - still in a very
|
||||
@ -298,7 +303,7 @@ should change the fundamentals.
|
||||
elsewhere.
|
||||
- It's not a single anchor per 3x3 window, but 9 anchors - one for
|
||||
each of three aspect ratios (1:2, 1:1, and 2:1), and each of three
|
||||
scale factors ($1, 2^{1/3}, and 2^{2/3}$) on top of its base scale.
|
||||
scale factors (\( 1, 2^{1/3}, and 2^{2/3} \)) on top of its base scale.
|
||||
This is just to handle objects of less-square shapes and to cover
|
||||
the gap in scale in between levels of the feature pyramid. Note
|
||||
that the scale factors are evenly-spaced exponentially, such that an
|
||||
@ -316,7 +321,7 @@ Every anchor associates an image region with a 3x3 window (i.e. a
|
||||
3x3x256 section - it's still 256-channel). The classification subnet
|
||||
is responsible for learning: do the features in this 3x3 window,
|
||||
produced from some input, image indicate that an object is inside this
|
||||
anchor? Or, more accurately: For each of $K$ object classes, what's
|
||||
anchor? Or, more accurately: For each of \( K \) object classes, what's
|
||||
the probability of each object (or just of it being background)?
|
||||
|
||||
** Box Regression Subnet
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user