Migrate some drafts into content/posts with 'draft' flag

This commit is contained in:
Chris Hodapp 2020-04-30 19:00:38 -04:00
parent fba8a611e3
commit 129bfeb3e7
8 changed files with 37 additions and 5195 deletions

View File

@ -2,7 +2,10 @@
title: Retrospect on Foresight
author: Chris Hodapp
date: January 8, 2018
tags: technobabble, rambling
tags:
- technobabble
- rambling
draft: true
---
/(Spawned from some idle thoughts around the summer of 2015.)/
@ -48,5 +51,6 @@ wildly impractical, or a mere facade over what is already established.
foresight.
- [[https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/][As We May Think (Vannevar Bush)]]
- "Do you remember a time when..." only goes so far.
- Buckminster Fuller
# Tools For Thought

View File

@ -1,7 +1,12 @@
#+TITLE: Modularity & Abstraction (working title)
#+AUTHOR: Chris Hodapp
#+DATE: April 20, 2017
#+TAGS: technobabble
---
title: Modularity & Abstraction (working title)
author: Chris Hodapp
date: April 20, 2017
tags:
- technobabble
- rambling
draft: true
---
# Why don't I turn this into a paper for arXiv too? It can still be
# posted to the blog (just also make it exportable to LaTeX perhaps)

View File

Before

Width:  |  Height:  |  Size: 369 KiB

After

Width:  |  Height:  |  Size: 369 KiB

View File

Before

Width:  |  Height:  |  Size: 256 KiB

After

Width:  |  Height:  |  Size: 256 KiB

View File

Before

Width:  |  Height:  |  Size: 124 KiB

After

Width:  |  Height:  |  Size: 124 KiB

View File

@ -2,9 +2,14 @@
title: Explaining RetinaNet
author: Chris Hodapp
date: December 13, 2017
tags: technobabble
tags:
- technobabble
draft: true
---
# TODO: The inline equations are still broken (maybe because this is
# in org format)
# Above uses style from https://github.com/turboMaCk/turboMaCk.github.io/blob/develop/posts/2016-12-21-org-mode-in-hakyll.org
# and https://turbomack.github.io/posts/2016-12-21-org-mode-in-hakyll.html
# description:
@ -29,7 +34,7 @@ taken from
#+CAPTION: TensorFlow object detection example 2.
#+ATTR_HTML: :width 100% :height 100%
[[../images/2017-12-13-retinanet/2017-12-13-objdet.jpg]]
[[./2017-12-13-objdet.jpg]]
At the time of writing, the most accurate object-detection methods
were based around R-CNN and its variants, and all used two-stage
@ -109,12 +114,12 @@ approaches while surpassing the accuracy of existing two-stage ones.
At least, this is what the paper claims. Their novel loss function is
called *Focal Loss* (as the title references), and it multiplies the
normal cross-entropy by a factor, $(1-p_t)^\gamma$, where $p_t$
normal cross-entropy by a factor, \( (1-p_t)^\gamma \), where \( p_t \)
approaches 1 as the model predicts a higher and higher probability of
the correct classification, or 0 for an incorrect one, and $\gamma$ is
a "focusing" hyperparameter (they used $\gamma=2$). Intuitively, this
the correct classification, or 0 for an incorrect one, and \( \gamma \) is
a "focusing" hyperparameter (they used \( \gamma=2 \)). Intuitively, this
scaling makes sense: if a classification is already correct (as in the
"easy negatives"), $(1-p_t)^\gamma$ tends toward 0, and so the portion
"easy negatives"), \( (1-p_t)^\gamma \) tends toward 0, and so the portion
of the loss multiplied by it will likewise tend toward 0.
@ -152,7 +157,7 @@ diagram illustrates an image pyramid:
#+CAPTION: Source: https://en.wikipedia.org/wiki/File:Image_pyramid.svg
#+ATTR_HTML: :width 100% :height 100%
[[../images/2017-12-13-retinanet/1024px-Image_pyramid.svg.png]]
[[./1024px-Image_pyramid.svg.png]]
Image pyramids have many uses, but the paper focuses on their use in
taking something that works only at a certain scale of image - for
@ -180,7 +185,7 @@ CNN:
#+CAPTION: Source: https://commons.wikimedia.org/wiki/File:Typical_cnn.png
#+ATTR_HTML: :width 100% :height 100%
[[../images/2017-12-13-retinanet/Typical_cnn.png]]
[[./Typical_cnn.png]]
You may notice that this network has a structure that bears some
resemblance to an image pyramid. This is because deep CNNs are
@ -262,13 +267,13 @@ inference.
In particular:
- Say that the feature pyramid has $L$ levels, and that level $l+1$ is
half the resolution (thus double the scale) of level $l$.
- Say that level $l$ is a 256-channel feature map of size $W \times H$
(i.e. it's a tensor with shape $W \times H \times 256$). Note that
$W$ and $H$ will be larger at lower levels, and smaller at higher
- Say that the feature pyramid has \( L \) levels, and that level \( l+1 \) is
half the resolution (thus double the scale) of level \( l \).
- Say that level \( l \) is a 256-channel feature map of size \( W \times H \)
(i.e. it's a tensor with shape \( W \times H \times 256 \)). Note that
\( W \) and \( H \) will be larger at lower levels, and smaller at higher
levels, but in RetinaNet at least, always 256-channel samples.
- For every point on that feature map (all $WH$ of them), we can
- For every point on that feature map (all \( WH \) of them), we can
identify a corresponding point in the input image. This is the
center point of a broad region of the input image that influences
this point in the feature map (i.e. its receptive field). Note that
@ -279,8 +284,8 @@ In particular:
rectangular regions associated with each point of a feature map.
The size of the anchor depends on the scale of the feature map, or
equivalently, what level of the feature map it came from. All this
means is that anchors in level $l+1$ are twice as large as the
anchors of level $l$.
means is that anchors in level \( l+1 \) are twice as large as the
anchors of level \( l \).
The view that this should paint is that a dense collection of anchors
covers the entire input image at different sizes - still in a very
@ -298,7 +303,7 @@ should change the fundamentals.
elsewhere.
- It's not a single anchor per 3x3 window, but 9 anchors - one for
each of three aspect ratios (1:2, 1:1, and 2:1), and each of three
scale factors ($1, 2^{1/3}, and 2^{2/3}$) on top of its base scale.
scale factors (\( 1, 2^{1/3}, and 2^{2/3} \)) on top of its base scale.
This is just to handle objects of less-square shapes and to cover
the gap in scale in between levels of the feature pyramid. Note
that the scale factors are evenly-spaced exponentially, such that an
@ -316,7 +321,7 @@ Every anchor associates an image region with a 3x3 window (i.e. a
3x3x256 section - it's still 256-channel). The classification subnet
is responsible for learning: do the features in this 3x3 window,
produced from some input, image indicate that an object is inside this
anchor? Or, more accurately: For each of $K$ object classes, what's
anchor? Or, more accurately: For each of \( K \) object classes, what's
the probability of each object (or just of it being background)?
** Box Regression Subnet

File diff suppressed because it is too large Load Diff