Migrate some drafts into content/posts with 'draft' flag
This commit is contained in:
parent
fba8a611e3
commit
129bfeb3e7
@ -2,7 +2,10 @@
|
|||||||
title: Retrospect on Foresight
|
title: Retrospect on Foresight
|
||||||
author: Chris Hodapp
|
author: Chris Hodapp
|
||||||
date: January 8, 2018
|
date: January 8, 2018
|
||||||
tags: technobabble, rambling
|
tags:
|
||||||
|
- technobabble
|
||||||
|
- rambling
|
||||||
|
draft: true
|
||||||
---
|
---
|
||||||
|
|
||||||
/(Spawned from some idle thoughts around the summer of 2015.)/
|
/(Spawned from some idle thoughts around the summer of 2015.)/
|
||||||
@ -48,5 +51,6 @@ wildly impractical, or a mere facade over what is already established.
|
|||||||
foresight.
|
foresight.
|
||||||
- [[https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/][As We May Think (Vannevar Bush)]]
|
- [[https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/][As We May Think (Vannevar Bush)]]
|
||||||
- "Do you remember a time when..." only goes so far.
|
- "Do you remember a time when..." only goes so far.
|
||||||
|
- Buckminster Fuller
|
||||||
|
|
||||||
# Tools For Thought
|
# Tools For Thought
|
||||||
@ -1,7 +1,12 @@
|
|||||||
#+TITLE: Modularity & Abstraction (working title)
|
---
|
||||||
#+AUTHOR: Chris Hodapp
|
title: Modularity & Abstraction (working title)
|
||||||
#+DATE: April 20, 2017
|
author: Chris Hodapp
|
||||||
#+TAGS: technobabble
|
date: April 20, 2017
|
||||||
|
tags:
|
||||||
|
- technobabble
|
||||||
|
- rambling
|
||||||
|
draft: true
|
||||||
|
---
|
||||||
|
|
||||||
# Why don't I turn this into a paper for arXiv too? It can still be
|
# Why don't I turn this into a paper for arXiv too? It can still be
|
||||||
# posted to the blog (just also make it exportable to LaTeX perhaps)
|
# posted to the blog (just also make it exportable to LaTeX perhaps)
|
||||||
|
Before Width: | Height: | Size: 369 KiB After Width: | Height: | Size: 369 KiB |
|
Before Width: | Height: | Size: 256 KiB After Width: | Height: | Size: 256 KiB |
|
Before Width: | Height: | Size: 124 KiB After Width: | Height: | Size: 124 KiB |
@ -2,9 +2,14 @@
|
|||||||
title: Explaining RetinaNet
|
title: Explaining RetinaNet
|
||||||
author: Chris Hodapp
|
author: Chris Hodapp
|
||||||
date: December 13, 2017
|
date: December 13, 2017
|
||||||
tags: technobabble
|
tags:
|
||||||
|
- technobabble
|
||||||
|
draft: true
|
||||||
---
|
---
|
||||||
|
|
||||||
|
# TODO: The inline equations are still broken (maybe because this is
|
||||||
|
# in org format)
|
||||||
|
|
||||||
# Above uses style from https://github.com/turboMaCk/turboMaCk.github.io/blob/develop/posts/2016-12-21-org-mode-in-hakyll.org
|
# Above uses style from https://github.com/turboMaCk/turboMaCk.github.io/blob/develop/posts/2016-12-21-org-mode-in-hakyll.org
|
||||||
# and https://turbomack.github.io/posts/2016-12-21-org-mode-in-hakyll.html
|
# and https://turbomack.github.io/posts/2016-12-21-org-mode-in-hakyll.html
|
||||||
# description:
|
# description:
|
||||||
@ -29,7 +34,7 @@ taken from
|
|||||||
|
|
||||||
#+CAPTION: TensorFlow object detection example 2.
|
#+CAPTION: TensorFlow object detection example 2.
|
||||||
#+ATTR_HTML: :width 100% :height 100%
|
#+ATTR_HTML: :width 100% :height 100%
|
||||||
[[../images/2017-12-13-retinanet/2017-12-13-objdet.jpg]]
|
[[./2017-12-13-objdet.jpg]]
|
||||||
|
|
||||||
At the time of writing, the most accurate object-detection methods
|
At the time of writing, the most accurate object-detection methods
|
||||||
were based around R-CNN and its variants, and all used two-stage
|
were based around R-CNN and its variants, and all used two-stage
|
||||||
@ -109,12 +114,12 @@ approaches while surpassing the accuracy of existing two-stage ones.
|
|||||||
|
|
||||||
At least, this is what the paper claims. Their novel loss function is
|
At least, this is what the paper claims. Their novel loss function is
|
||||||
called *Focal Loss* (as the title references), and it multiplies the
|
called *Focal Loss* (as the title references), and it multiplies the
|
||||||
normal cross-entropy by a factor, $(1-p_t)^\gamma$, where $p_t$
|
normal cross-entropy by a factor, \( (1-p_t)^\gamma \), where \( p_t \)
|
||||||
approaches 1 as the model predicts a higher and higher probability of
|
approaches 1 as the model predicts a higher and higher probability of
|
||||||
the correct classification, or 0 for an incorrect one, and $\gamma$ is
|
the correct classification, or 0 for an incorrect one, and \( \gamma \) is
|
||||||
a "focusing" hyperparameter (they used $\gamma=2$). Intuitively, this
|
a "focusing" hyperparameter (they used \( \gamma=2 \)). Intuitively, this
|
||||||
scaling makes sense: if a classification is already correct (as in the
|
scaling makes sense: if a classification is already correct (as in the
|
||||||
"easy negatives"), $(1-p_t)^\gamma$ tends toward 0, and so the portion
|
"easy negatives"), \( (1-p_t)^\gamma \) tends toward 0, and so the portion
|
||||||
of the loss multiplied by it will likewise tend toward 0.
|
of the loss multiplied by it will likewise tend toward 0.
|
||||||
|
|
||||||
|
|
||||||
@ -152,7 +157,7 @@ diagram illustrates an image pyramid:
|
|||||||
|
|
||||||
#+CAPTION: Source: https://en.wikipedia.org/wiki/File:Image_pyramid.svg
|
#+CAPTION: Source: https://en.wikipedia.org/wiki/File:Image_pyramid.svg
|
||||||
#+ATTR_HTML: :width 100% :height 100%
|
#+ATTR_HTML: :width 100% :height 100%
|
||||||
[[../images/2017-12-13-retinanet/1024px-Image_pyramid.svg.png]]
|
[[./1024px-Image_pyramid.svg.png]]
|
||||||
|
|
||||||
Image pyramids have many uses, but the paper focuses on their use in
|
Image pyramids have many uses, but the paper focuses on their use in
|
||||||
taking something that works only at a certain scale of image - for
|
taking something that works only at a certain scale of image - for
|
||||||
@ -180,7 +185,7 @@ CNN:
|
|||||||
|
|
||||||
#+CAPTION: Source: https://commons.wikimedia.org/wiki/File:Typical_cnn.png
|
#+CAPTION: Source: https://commons.wikimedia.org/wiki/File:Typical_cnn.png
|
||||||
#+ATTR_HTML: :width 100% :height 100%
|
#+ATTR_HTML: :width 100% :height 100%
|
||||||
[[../images/2017-12-13-retinanet/Typical_cnn.png]]
|
[[./Typical_cnn.png]]
|
||||||
|
|
||||||
You may notice that this network has a structure that bears some
|
You may notice that this network has a structure that bears some
|
||||||
resemblance to an image pyramid. This is because deep CNNs are
|
resemblance to an image pyramid. This is because deep CNNs are
|
||||||
@ -262,13 +267,13 @@ inference.
|
|||||||
|
|
||||||
In particular:
|
In particular:
|
||||||
|
|
||||||
- Say that the feature pyramid has $L$ levels, and that level $l+1$ is
|
- Say that the feature pyramid has \( L \) levels, and that level \( l+1 \) is
|
||||||
half the resolution (thus double the scale) of level $l$.
|
half the resolution (thus double the scale) of level \( l \).
|
||||||
- Say that level $l$ is a 256-channel feature map of size $W \times H$
|
- Say that level \( l \) is a 256-channel feature map of size \( W \times H \)
|
||||||
(i.e. it's a tensor with shape $W \times H \times 256$). Note that
|
(i.e. it's a tensor with shape \( W \times H \times 256 \)). Note that
|
||||||
$W$ and $H$ will be larger at lower levels, and smaller at higher
|
\( W \) and \( H \) will be larger at lower levels, and smaller at higher
|
||||||
levels, but in RetinaNet at least, always 256-channel samples.
|
levels, but in RetinaNet at least, always 256-channel samples.
|
||||||
- For every point on that feature map (all $WH$ of them), we can
|
- For every point on that feature map (all \( WH \) of them), we can
|
||||||
identify a corresponding point in the input image. This is the
|
identify a corresponding point in the input image. This is the
|
||||||
center point of a broad region of the input image that influences
|
center point of a broad region of the input image that influences
|
||||||
this point in the feature map (i.e. its receptive field). Note that
|
this point in the feature map (i.e. its receptive field). Note that
|
||||||
@ -279,8 +284,8 @@ In particular:
|
|||||||
rectangular regions associated with each point of a feature map.
|
rectangular regions associated with each point of a feature map.
|
||||||
The size of the anchor depends on the scale of the feature map, or
|
The size of the anchor depends on the scale of the feature map, or
|
||||||
equivalently, what level of the feature map it came from. All this
|
equivalently, what level of the feature map it came from. All this
|
||||||
means is that anchors in level $l+1$ are twice as large as the
|
means is that anchors in level \( l+1 \) are twice as large as the
|
||||||
anchors of level $l$.
|
anchors of level \( l \).
|
||||||
|
|
||||||
The view that this should paint is that a dense collection of anchors
|
The view that this should paint is that a dense collection of anchors
|
||||||
covers the entire input image at different sizes - still in a very
|
covers the entire input image at different sizes - still in a very
|
||||||
@ -298,7 +303,7 @@ should change the fundamentals.
|
|||||||
elsewhere.
|
elsewhere.
|
||||||
- It's not a single anchor per 3x3 window, but 9 anchors - one for
|
- It's not a single anchor per 3x3 window, but 9 anchors - one for
|
||||||
each of three aspect ratios (1:2, 1:1, and 2:1), and each of three
|
each of three aspect ratios (1:2, 1:1, and 2:1), and each of three
|
||||||
scale factors ($1, 2^{1/3}, and 2^{2/3}$) on top of its base scale.
|
scale factors (\( 1, 2^{1/3}, and 2^{2/3} \)) on top of its base scale.
|
||||||
This is just to handle objects of less-square shapes and to cover
|
This is just to handle objects of less-square shapes and to cover
|
||||||
the gap in scale in between levels of the feature pyramid. Note
|
the gap in scale in between levels of the feature pyramid. Note
|
||||||
that the scale factors are evenly-spaced exponentially, such that an
|
that the scale factors are evenly-spaced exponentially, such that an
|
||||||
@ -316,7 +321,7 @@ Every anchor associates an image region with a 3x3 window (i.e. a
|
|||||||
3x3x256 section - it's still 256-channel). The classification subnet
|
3x3x256 section - it's still 256-channel). The classification subnet
|
||||||
is responsible for learning: do the features in this 3x3 window,
|
is responsible for learning: do the features in this 3x3 window,
|
||||||
produced from some input, image indicate that an object is inside this
|
produced from some input, image indicate that an object is inside this
|
||||||
anchor? Or, more accurately: For each of $K$ object classes, what's
|
anchor? Or, more accurately: For each of \( K \) object classes, what's
|
||||||
the probability of each object (or just of it being background)?
|
the probability of each object (or just of it being background)?
|
||||||
|
|
||||||
** Box Regression Subnet
|
** Box Regression Subnet
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user