Migrate some drafts into content/posts with 'draft' flag

2020-04-30 19:00:38 -04:00
parent fba8a611e3
commit 129bfeb3e7
8 changed files with 37 additions and 5195 deletions
--- a/you_still_need_this/drafts/2017-01-08-retrospect-foresight.org
+++ b/you_still_need_this/drafts/2017-01-08-retrospect-foresight.org
@@ -1,52 +0,0 @@
---
-title: Retrospect on Foresight
-author: Chris Hodapp
-date: January 8, 2018
-tags: technobabble, rambling
---
-
-/(Spawned from some idle thoughts around the summer of 2015.)/
-
-Why are old technological ideas that were "ahead of their time", but
-which lost out to other ideas, worth studying?
-
-We can see them as raw ideas that "modern" understanding never
-refined - misguided fantasies or even just mistakes.  The flip side of
-this is that we can see them as ideas that are free of a nearly
-inescapable modern context and all of the preconceptions and blinders
-it carries.
-
-In some of these visionaries is a valuable combination:
-
- they're detached from this modern context (by mere virtue of it not
-  existing yet),
- they have considerable experience, imagination, and foresight,
- they devoted time and effort to work extensively on something and to
-  communicate their thoughts, feelings, and analysis in a durable way.
-
-To put it in another way: They give us analysis done from a context
-that is long gone. They help us think beyond our current context.
-They help us answer a question, "What if we took a different path
-then?"
-
-[[http://www.cs.yale.edu/homes/perlis-alan/quotes.html][Epigram #53]] from Alan Perlis offers some relevant skepticism here: "So
-many good ideas are never heard from again once they embark in a
-voyage on the semantic gulf."  My interpretation of it is that we tend
-to idolize ideas, old and new, because they sound somehow different,
-innovative, and groundbreaking, but attempts at analysis or practical
-realization of the ideas leads to a bleaker reality, perhaps that the
-idea is completely meaningless (the equivalent of a [[https://en.wiktionary.org/wiki/deepity][deepity]], perhaps),
-wildly impractical, or a mere facade over what is already established.
-
-* Examples
-
-* Scratch
-
- Douglas Engelbart is perhaps one of the canonical examples of a person
-  who was an endless source of these ideas.  Ted Nelson arguably is
-  another.  Alan Turing is an early example widely regarded for his
-  foresight.
- [[https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/][As We May Think (Vannevar Bush)]]
- "Do you remember a time when..." only goes so far.
-
-# Tools For Thought
--- a/you_still_need_this/drafts/2017-04-20-modularity.org
+++ b/you_still_need_this/drafts/2017-04-20-modularity.org
@@ -1,376 +0,0 @@
-#+TITLE: Modularity & Abstraction (working title)
-#+AUTHOR: Chris Hodapp
-#+DATE: April 20, 2017
-#+TAGS: technobabble
-
-# Why don't I turn this into a paper for arXiv too?  It can still be
-# posted to the blog (just also make it exportable to LaTeX perhaps)
-
-_Modularity_ and _abstraction_ feature prominently wherever computers
-are involved.  This is meant very broadly: it applies to designing
-software, using software, integrating software, and to a lot of
-hardware as well.  It applies elsewhere, and almost certainly
-originated elsewhere first, however, it appears especially crucial
-around software.
-
-Definitions, though, are a bit vague (including anything in this
-post).  My goal in this post isn't to try to (re)define them, but to
-explain their essence and expand on a few theses:
-
- Modularity arises naturally in a wide array of places.
- Modularity and abstraction are intrinsically connected.
- Both are for the benefit of people.  This usually doesn't need
-  stated, but to echo Paul Graham and probably others: to the
-  computer, it is all the same.
- More specifically, both are there to manage *complexity* by
-  assigning meaningful information and boundaries which allow people
-  to match a problem to what they can actually think about.
-
-# - Whether a given modularization makes sense depends strongly on
-#  meaning and relevance of *information* inside and outside of
-#  modules, and broad context matters to those.
-
-* Why?
-
-People generally agree that "modularity" is good.  The idea that
-something complex can be designed and understood in terms of smaller,
-simpler pieces comes naturally to anyone that has built something out
-of smaller pieces or taken something apart.  (This isn't to say that
-reductionism is the best way to understand everything, but that's
-another matter.)  It runs very deep in the Unix philosophy, which ESR
-gives a good overview of in [[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html][The Art of Unix Programming]] - or, listen
-to it from [[https://youtu.be/tc4ROCJYbm0?t%3D248][Kernighan himself]] at Bell Labs in
-1982.
-
-Tim Berners-Lee gives some practical limitations in [[https://www.w3.org/DesignIssues/Principles.html][Principles of
-Design]] and in [[https://www.w3.org/DesignIssues/Modularity.html][Modularity]]: "Modular design hinges on the simplicity and
-abstract nature of the interface definition between the modules. A
-design in which the insides of each module need to know all about each
-other is not a modular design but an arbitrary partitioning of the
-bits... It is not only necessary to make sure your own system is
-designed to be made of modular parts. It is also necessary to realize
-that your own system, no matter how big and wonderful it seems now,
-should always be designed to be a part of another larger system."  Les
-Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of
-future software]] even did an interesting derivation tying the defect
-density in software to how it is broken into pieces.  The 1972 paper
-[[https://www.cs.virginia.edu/~eos/cs651/papers/parnas72.pdf][On the Criteria to be Used in Decomposing System into Modules]] cites a
-1970 textbook on why modularity is important in systems programming,
-but also notes that nothing is said on how to divide a systems into
-modules.
-
-"Abstraction" doesn't have quite the same consensus. In software, it's
-generally understood that decoupled or loosely-coupled is better than
-tightly-coupled, but at the same time, "abstraction" can have the
-connotation of something that gets in the way, adds overhead, and
-confuses things.  Dijkstra, in one of few instances of not being
-snarky, allegedly said, "Being abstract is something profoundly
-different from being vague.  The purpose of abstraction is not to be
-vague, but to create a new semantic level in which one can be
-absolutely precise."  Joel Spolsky, in one of few instances of me
-actually caring what he said, also has a blog post from 2002 on the
-[[https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/][Law of Leaky Abstractions]] ("All non-trivial abstractions, to some
-degree, are leaky.")  The [[https://en.wikipedia.org/wiki/Principle_of_least_privilege][principle of least privilege]] is likewise a
-thing. So, abstraction too has its practical and theoretical
-limitations.
-
-* How They Relate
-
-I bring these up together because: *abstractions* are the boundaries
-between *modules*, and the communication channels (APIs, languages,
-interfaces, protocols) through which they talk.  It need not
-necessarily be a standardized interface or a well-documented boundary,
-though that helps.
-
-Available abstractions vary. They vary by, for instance:
- ...what language you choose.  Consider, for instance, that a language
-  like Haskell contains various abstractions done largely within the
-  type system that cannot be expressed in many other languages.
-  Languages like Python, Ruby, or JavaScript might have various
-  abstractions meaningful only in the context of dynamic typing.  Some
-  languages more readily permit the creation of new abstractions, and
-  this might lead to a broader range of abstractions implemented in
-  libraries.
- ...the operating system and its standard library.  What is a
-  process?  What is a thread?  What is a dynamic library?  What is a
-  filesystem?  What is a file?  What is a block device?  What is a
-  socket?  What is a virtual machine?  What is a bus?  What is a
-  commandline?
- ...the time period.  How many of the abstractions named above were
-  around or viable in 1970, 1980, 1990, 2000? In the opposite
-  direction, when did you last use that lovely standardized protocol,
-  [[https://en.wikipedia.org/wiki/Common_Gateway_Interface][CGI]], to let your web application and your web server communicate,
-  use [[https://en.wikipedia.org/wiki/PHIGS][PHIGS]] to render graphics, or access a large multiuser system
-  via hard-wired terminals?
-
-As such: Possible ways to modularize things vary.  It may make no
-sense that certain ways of modularization even can or should exist
-until it's been done other ways hundreds or thousands of times.
-
-Other terms are related too.  "Loosely-coupled" (or loose coupling)
-and "tightly-coupled" refer to the sort of abstractions sitting
-between modules, or whether or not there even are separate modules.
-"Decoupling" involves changing the relationship between modules
-(sometimes, creating them in the first place), typically splitting
-things into two more sensible pieces that a more sensible abstraction
-separates.  "Factoring out" is really a form of decoupling in which
-smaller parts of something are turned into a module which the original
-thing then interfaces with (one canonical example is taking some bits
-of code, often that are very similar or identical in many places, and
-moving them into a single function).  To say one has "abstracted over"
-some details implies that a module is handling those details, that the
-details shouldn't matter, and what does matter is the abstraction one
-is using.
-
-One of Rich Hickey's favorite topics is *composition*, and with good
-reason (and you should check out [[http://www.infoq.com/presentations/Simple-Made-Easy/][Simple Made Easy]] regardless).  This
-relates as well: to *compose* things together effectively into bigger
-parts requires that they support some common abstraction.
-
-In the same area, [[https://clojurefun.wordpress.com/2012/08/17/composition-over-convention/][Composition over convention]] is a good read on how
-/frameworks/ run counter to modularity: they aren't built to behave
-like modules of a larger system.
-
-# -----
-
-It has a very pragmatic reason behind it: When something is a module
-unto itself, presumably it is relying on specific abstractions, and it
-is possible to freely change this module's internal details (provided
-that it still respects the same abstractions), to move this module to
-other contexts (anywhere that provides the same abstractions), and to
-replace it with other modules (anything that respects the same
-abstractions).
-
-It also has a more abstract reason: When something is a module unto
-itself, the way it is designed and implemented usually presents more
-insight into the fundamentals of the problem it is solving. It
-contains fewer incidental details, and more essential details.
-
-# -------
-
-* Information
-
-I referred earlier to the abstractions themselves as both boundaries
-and communications channels.  Another common view is that abstractions
-are *contracts* with a communicated and agreed purpose, and I think
-this is a useful definition too: it conveys the notion that there are
-multiple parties involved and that they are free to behave as needed
-provided that they fulfill some obligation
-
-Some definitions refer directly to information, like the [[https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming)][abstraction
-principle]] which aims to reduce duplication of information which fits
-with [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][don't repeat yourself]] so that "a modification of any single
-element of a system does not require a change in other logically
-unrelated elements".
-
-
-
-# ----- FIXME
-Consider the information this module deals in, in essence.
-
-What is the most general form this information could be expressed in,
-without being so general as to encompass other things that are
-irrelevant or so low-level as to needlessly constrain the possible
-contexts?
- 
-(Aristotle's theory of definitions?)
-
-* Less-Conventional Examples
-
-One thing I've watched with some interest is when new abstractions
-emerge (or, perhaps, old ones become more widespread) to solve
-problems that I wasn't even aware existed.
-
-[[https://circleci.com/blog/it-really-is-the-future/][It really is the future]] talks about a lot of more recent forms of
-modularity from the land of devops, most of which were completely
-unheard-of in, say, 2010.  [[https://www.functionalgeekery.com/episode-75-eric-b-merritt/][Functional Geekery episode 75]] talks about
-many similar things.
-
-[[https://jupyter.org/][Jupyter Notebook]] is one of my favorites here.  It provides a notebook
-interface (similar to something like Maple or Mathematica) which:
-
- allows the notebook to use various different programming languages
-  underneath,
- decouples where the notebook is used and where it is running, due to
-  being implemented as a web application accessed through the browser,
- decouples the presentation of a stored notebook from Jupyter itself
-  by using a [[https://nbformat.readthedocs.io/en/latest/][JSON-based file format]] which can be rendered without
-  Jupyter (like GitHub does if you commit a .ipynb file).
-
-I love notebook interfaces already because they simplify experimenting
-by handling a lot of things I'd otherwise have to do manually - like
-saving results and keeping them lined up with the exact code that
-produced them.  Jupyter adds some other use-cases I find marvelous -
-for instance, I can let the interpreter run on my workstation which
-has all of the computing power, but I can access it across the
-Internet from my laptop.
-
-[[https://zeppelin.apache.org/][Apache Zeppelin]] does similar things with different languages; I've
-just used it much less.
-
-Another favorite of mine is [[https://nixos.org/nix/][Nix]].  One excellent article, [[http://blog.ezyang.com/2014/08/the-fundamental-problem-of-programming-language-package-management/][The
-fundamental problem of programming language package management]],
-doesn't ever mention Nix but explains very well the problems it sets
-out to solve.  To be able to combine nearly all of the
-programming-language specific package managers into a single module is
-a very lofty goal, but Nix appears to do a decent job of it (among
-other things).
-
-The [[https://www.lua.org/][Lua]] programming language is noteworthy here.  It's written in
-clean C with minimal dependencies, so it runs nearly anywhere that a C
-or C++ compiler targets.  It's purposely very easy both to *embed*
-(i.e. to put inside of a program and use as an extension language,
-such as for plugins or scripting) and to *extend* (i.e. to connect
-with libraries to allow their functionality to be used from Lua).  [[https://www.gnu.org/software/guile/][GNU
-Guile]] has many of the same properties, I'm told.
-
-We ordinarily think of object systems as something living in the
-programming language.  However, the object system is sometimes made a
-module that is outside of the programming language, and languages just
-interact with it.  [[https://en.wikipedia.org/wiki/GObject][GObject]], [[https://en.wikipedia.org/wiki/Component_Object_Model][COM]], and [[https://en.wikipedia.org/wiki/XPCOM][XPCOM]] do this, and to some
-extent, so does [[https://en.wikipedia.org/wiki/Meta-object_System][Qt & MOC]] - and there are probably hundreds of others,
-particularly if you allow dead ones created during the object-oriented
-hype of the '90s.  This seems to happen in systems where the object
-hierarchy is in effect "bigger" than the language.
-
-[[https://zeromq.org/][ZeroMQ]] is another example: a set of cross-language abstractions for
-communication patterns in a distributed system.  I know it's likely
-not unique, but it is one of the better-known and the first I thought
-of, and I think their [[http://zguide.zeromq.org/page:all][guide]] is excellent.
-
-Interestingly, the same iMatix behind ZeroMQ also created [[https://github.com/imatix/gsl][GSL]] and
-explained its value in [[https://imatix-legacy.github.io/mop/introduction.html][Model-Oriented Programming]], for which
-abstraction features heavily.  I've not used GSL, and am skeptical of
-its stated usefulness, but it looks like it is meant to help create
-compile-time abstractions that likewise sit outside of any particular
-programming language.
-
-# TODO: Expand on this.
-
-[[https://web.hypothes.is/][hypothes.is]] is a curious one that I find fascinating.  They're trying
-to factor out annotation and commenting from something that is handled
-on a per-webpage basis and turn it into its own module, and I really
-like what I've seen.  However, it does not seem to have caught on
-much.
-
-The Unix tradition lives on in certain modern tools. [[https://stedolan.github.io/jq/][jq]] has proven
-very useful anytime I've had to mess with JSON data.  [[http://www.dest-unreach.org/socat/][socat]] and [[http://netcat.sourceforge.net/][netcat]]
-have saved me numerous times.  I'm sure certain people love the fact
-that [[https://neovim.io/][Neovim]] is designed to be seamlessly embedded and to extend with
-plugins.  [[https://suckless.org/philosophy][suckless]] perhaps takes it too far, but gets an honorary
-mention...
-
-# ???
-
-# Also, TCP/IP and the entire notion of packet-switched networks.
-# And the entire OSI 7-layer model.
-
-# Also, caches - of all types.  (CPU, disk...)
-
-# One key is how the above let you *reason* about things without
-# knowing their specifics.
-
-People know that I love Emacs, but I also do believe many of the
-complaints on how large it is.  Despite that it is basically its own
-operating system, /within this/ it has considerable modularity.  The
-same applies somewhat to Blender, I suppose.
-
-Consider [[https://research.google.com/pubs/pub43146.html][Machine Learning: The High Interest Credit Card of Technical Debt]],
-a paper that anyone working around machine learning should read and
-re-read regularly.  Large parts of the paper are about ways in which
-machine learning conflicts with proper modularity and abstraction.
-(However, [[https://colah.github.io/posts/2015-09-NN-Types-FP/][Neural Networks, Types, and Functional Programming]] is still
-a good post and shows some sorts of abstraction that still exist
-at least in neural networks.)
-
-Even DOS had useful abstractions.  Things like
-DriveSpace/DoubleSpace/Stacker worked well enough because most
-software that needed files relied on DOS's normal abstractions to
-access them - so it did not matter to them that the underlying
-filesystem was actually compressed, or was actually a RAM disk, or was
-on some obscure SCSI interface.  Likewise, for the silliness known as
-[[https://en.wikipedia.org/wiki/Expanded_memory][EMS]], applications that accessed memory through the EMS abstraction
-could disregard whether it was a "real" EMS board providing access to
-that memory, whether it was an expanded memory manager providing
-indirect access to some other memory or even to a hard disk pretending
-to be memory.
-
-Even more abstractly: emulators work because so much software
-respected the abstraction of some specific CPU and hardware platform.
-
-Submitted without further comment:
-https://github.com/stevemao/left-pad/issues/4
-
-* Fragments
-
- Abstracting over...
-  - Multiple applications
-  - Multiple users
-  - Multiple CPUs
-  - Multiple hosts
-
- [[Notes - Paper, 2016-11-13]]
- Tanenbaum vs. Linus war & microkernels
- TBL: "The choice of language is a common design choice. The low
-  power end of the scale is typically simpler to design, implement and
-  use, but the high power end of the scale has all the attraction of
-  being an open-ended hook into which anything can be placed: a door
-  to uses bounded only by the imagination of the programmer.  Computer
-  Science in the 1960s to 80s spent a lot of effort making languages
-  which were as powerful as possible. Nowadays we have to appreciate
-  the reasons for picking not the most powerful solution but the least
-  powerful. The reason for this is that the less powerful the
-  language, the more you can do with the data stored in that
-  language. If you write it in a simple declarative from, anyone can
-  write a program to analyze it in many ways."  (Languages are a kind
-  of abstraction - one that influences how a module is written, and
-  what contexts it is useful in.)
- "Self" paper & structural reification?
-  - I'm still not sure how this relates, but it may perhaps relate to
-    how *not* to make things modular (structural reification is a sort
-    of check on the scope of objects/classes)
- What by Rich Hickey?
-  - Simple Made Easy?
-  - The Value of Values?
- SICP: [[https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-19.html#%25_chap_3][Modularity, Objects, and State]]
- [[https://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf][On Understanding Data Abstraction, Revisited]]
- http://www.catb.org/~esr/writings/taoup/html/apb.html#Baldwin-Clark -
-  Carliss Baldwin and Kim Clark. Design Rules, Vol 1: The Power of
-  Modularity. 2000. MIT Press. ISBN 0-262-024667.
- Brooks, No Silver Bullet?
-
- https://en.wikipedia.org/wiki/Essential_complexity
-
- https://twitter.com/fchollet/status/962074070513631232
-
- [[https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-9.html#%25_chap_1][From SICP chapter 1 intro]]: "The acts of the mind, wherein it exerts
-  its power over simple ideas, are chiefly these three: 1. Combining
-  several simple ideas into one compound one, and thus all complex
-  ideas are made. 2. The second is bringing two ideas, whether simple
-  or complex, together, and setting them by one another so as to take
-  a view of them at once, without uniting them into one, by which it
-  gets all its ideas of relations. 3. The third is separating them
-  from all other ideas that accompany them in their real existence:
-  this is called abstraction, and thus all its general ideas are
-  made." -John Locke, An Essay Concerning Human Understanding (1690)
- One point I have ignored (maybe): You clearly separate the 'inside'
-  of a module (its implementation) from the 'outside' (that is - its
-  boundaries, the abstractions that it interfaces with or that it
-  implements) so that the 'inside' can change more or less freely
-  without having any effect on the outside.
- Abstractions as a way of reducing the work required to add
-  functionality (changes can be made just in the relevant modules, and
-  other modules do not need to change to conform)
- What is more key?  Communication, information content, contracts,
-  details?
-  - [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][Don't repeat yourself]]
- [[https://simplyphilosophy.org/study/aristotles-definitions/][Aristotle & theory of definitions]]
-  - this isn't right.  I need to find the quote in the Durant book
-    (which will probably have an actual source) that pertains to how
-    specific and how general a definition must be
-
- [[https://en.wikipedia.org/wiki/SOLID][SOLID]]
- [[https://en.wikipedia.org/wiki/Cross-cutting_concern][Cross-cutting concerns]] and [[https://en.wikipedia.org/wiki/Aspect-oriented_programming][Aspect-oriented programming]]
- [[https://en.wikipedia.org/wiki/Separation_of_concerns][Separation of Concerns]]
- [[https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming)][Abstraction principle]]
- [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][Don't repeat yourself]]
--- a/you_still_need_this/drafts/2017-12-13-retinanet.org
+++ b/you_still_need_this/drafts/2017-12-13-retinanet.org
@@ -1,368 +0,0 @@
---
-title: Explaining RetinaNet
-author: Chris Hodapp
-date: December 13, 2017
-tags: technobabble
---
-
-# Above uses style from https://github.com/turboMaCk/turboMaCk.github.io/blob/develop/posts/2016-12-21-org-mode-in-hakyll.org
-# and https://turbomack.github.io/posts/2016-12-21-org-mode-in-hakyll.html
-# description: 
-# subtitle: 
-
-A paper came out in the past few months,
-[[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object Detection]], from one of
-Facebook's teams.  The goal of this post is to
-explain this paper as I work through it, through some of its
-references, and one particular [[https://github.com/fizyr/keras-retinanet][implementation in Keras]].
-
-* Object Detection
-
-"Object detection" as it is used here refers to machine learning
-models that can not just identify a single object in an image, but can
-identify and *localize* multiple objects, like in the below photo
-taken from
-[[https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html][Supercharge your Computer Vision models with the TensorFlow Object Detection API]]:
-
-# TODO:
-# Define mAP
-
-#+CAPTION: TensorFlow object detection example 2.
-#+ATTR_HTML: :width 100% :height 100%
-[[../images/2017-12-13-retinanet/2017-12-13-objdet.jpg]]
-
-At the time of writing, the most accurate object-detection methods
-were based around R-CNN and its variants, and all used two-stage
-approaches:
-
-1. One model proposes a sparse set of locations in the image that
-   probably contain something.  Ideally, this contains all objects in
-   the image, but filters out the majority of negative locations
-   (i.e. only background, not foreground).
-2. Another model, typically a CNN (convolutional neural network),
-   classifies each location in that sparse set as either being
-   foreground and some specific object class (like "kite" or "person"
-   above), or as being background.
-
-Single-stage approaches were also developed, like [[https://pjreddie.com/darknet/yolo/][YOLO]], [[https://arxiv.org/abs/1512.02325][SSD]], and
-OverFeat. These simplified/approximated the two-stage approach by
-replacing the first step with brute force.  That is, instead of
-generating a sparse set of locations that probably have something of
-interest, they simply handle all locations, whether or not they likely
-contain something, by blanketing the entire image in a dense sampling
-of many locations, many sizes, and many aspect ratios.
-
-This is simpler and faster - but not as accurate as the two-stage
-approaches.
-
-Methods like [[https://arxiv.org/abs/1506.01497][Faster R-CNN]] (not to be confused with Fast R-CNN... no, I
-didn't come up with these names) merge the two models of two-stage
-approaches into a single CNN, and exploit the possibility of sharing
-computations that would otherwise be done twice.  I assume that this
-is included in the comparisons done in the paper, but I'm not entirely
-sure.
-
-* Training & Class Imbalance
-
-Briefly, the process of training these models requires minimizing some
-kind of loss function that is based on what the model misclassifies
-when it is run on some training data.  It's preferable to be able to
-compute some loss over each individual instance, and add all of these
-losses up to produce an overall loss.  (Yes, far more can be said on
-this, but the details aren't really important here.)
-
-# TODO: What else can I say about why loss should be additive?
-# Quote DL text? ML text?
-
-This leads to a problem in one-stage detectors: That dense set of
-locations that it's classifying usually contains a small number of
-locations that actually have objects (positives), and a much larger
-number of locations that are just background and can be very easily
-classified as being in the background (easy negatives). However, the
-loss function still adds all of them up - and even if the loss is
-relatively low for each of the easy negatives, their cumulative loss
-can drown out the loss from objects that are being misclassified.
-
-That is: A large number of tiny, irrelevant losses overwhelm a smaller
-number of larger, relevant losses.  The paper was a bit terse on this;
-it took a few re-reads to understand why "easy negatives" were an
-issue, so hopefully I have this right.
-
-The training process is trying to minimize this loss, and so it is
-mostly nudging the model to improve where it least needs it (its
-ability to classify background areas that it already classifies well)
-and neglecting where it most needs it (its ability to classify the
-"difficult" objects that it is misclassifying).
-
-# TODO: Visualize this. Can I?
-
-This is *class imbalance* in a nutshell, which the paper gives as the
-limiting factor for the accuracy of one-stage detectors.  While the
-existing approaches try to tackle it with methods like bootstrapping
-or hard example mining, the accuracy still is lower.
-
-** Focal Loss
-
-So, the point of all this is: A tweak to the loss function can fix
-this issue, and retain the speed and simplicity of one-stage
-approaches while surpassing the accuracy of existing two-stage ones.
-
-At least, this is what the paper claims.  Their novel loss function is
-called *Focal Loss* (as the title references), and it multiplies the
-normal cross-entropy by a factor, $(1-p_t)^\gamma$, where $p_t$
-approaches 1 as the model predicts a higher and higher probability of
-the correct classification, or 0 for an incorrect one, and $\gamma$ is
-a "focusing" hyperparameter (they used $\gamma=2$).  Intuitively, this
-scaling makes sense: if a classification is already correct (as in the
-"easy negatives"), $(1-p_t)^\gamma$ tends toward 0, and so the portion
-of the loss multiplied by it will likewise tend toward 0.
-
-
-* RetinaNet architecture
-
-The paper gives the name *RetinaNet* to the network they created which
-incorporates this focal loss in its training.  While it says, "We
-emphasize that our simple detector achieves top results not based on
-innovations in network design but due to our novel loss," it is
-important not to miss that /innovations in/: they are saying that they
-didn't need to invent a new network design - not that the network
-design doesn't matter.  Later in the paper, they say that it is in
-fact crucial that RetinaNet's architecture relies on FPN (Feature
-Pyramid Network) as its backbone.  As far as I can tell, the
-architecture's use of a variant of RPN (Region Proposal Network) is
-also very important.
-
-I go into both of these aspects below.
-
-* Feature Pyramid Network
-
-Another recent paper, [[https://arxiv.org/abs/1612.03144][Feature Pyramid Networks for Object Detection]],
-describes the basis of this FPN in detail (and, non-coincidentally I'm
-sure, the paper shares 4 co-authors with the paper this post
-explores).  The paper is fairly concise in describing FPNs; it only
-takes it around 3 pages to explain their purpose, related work, and
-their entire design.  The remainder shows experimental results and
-specific applications of FPNs.  While it shows FPNs implemented on a
-particular underlying network (ResNet, mentioned below), they were
-made purposely to be very simple and adaptable to nearly any kind of
-CNN.
-
-To begin understanding this, start with [[https://en.wikipedia.org/wiki/Pyramid_%2528image_processing%2529][image pyramids]].  The below
-diagram illustrates an image pyramid:
-
-#+CAPTION: Source: https://en.wikipedia.org/wiki/File:Image_pyramid.svg
-#+ATTR_HTML: :width 100% :height 100%
-[[../images/2017-12-13-retinanet/1024px-Image_pyramid.svg.png]]
-
-Image pyramids have many uses, but the paper focuses on their use in
-taking something that works only at a certain scale of image - for
-instance, an image classification model that only identifies objects
-that are around 50 pixels across - and adapting it to handle different
-scales by applying it at every level of the image pyramid.  If the
-model has a little flexibility, some level of the image pyramid is
-bound to have scaled the object to the correct size that the model can
-match it.
-
-Typically, though, detection or classification isn't done directly on
-an image, but rather, the image is converted to some more useful
-feature space. However, these feature spaces likewise tend to be
-useful only at a specific scale.  This is the rationale behind
-"featurized image pyramids", or feature pyramids built upon image
-pyramids, created by converting each level of an image pyramid to that
-feature space.
-
-The problem with featurized image pyramids, the paper says, is that if
-you try to use them in CNNs, they drastically slow everything down,
-and use so much memory as to make normal training impossible.
-
-However, take a look below at this generic diagram of a generic deep
-CNN:
-
-#+CAPTION: Source: https://commons.wikimedia.org/wiki/File:Typical_cnn.png
-#+ATTR_HTML: :width 100% :height 100%
-[[../images/2017-12-13-retinanet/Typical_cnn.png]]
-
-You may notice that this network has a structure that bears some
-resemblance to an image pyramid.  This is because deep CNNs are
-already computing a sort of pyramid in their convolutional and
-subsampling stages.  In a nutshell, deep CNNs used in image
-classification push an image through a cascade of feature detectors or
-filters, and each successive stage contains a feature map that is
-built out of features in the prior stage - thus producing a *feature
-hierarchy* which already is something like a pyramid and contains
-multiple different scales.  (Being able to train deep CNNs to jointly
-learn the filters at each stage of that feature hierarchy from the
-data, rather than engineering them by hand, is what sets deep learning
-apart from "shallow" machine learning.)
-
-When you move through levels of a featurized image pyramid, only scale
-should change.  When you move through levels of a feature hierarchy
-described here, scale changes, but so does the meaning of the
-features.  This is the *semantic gap* the paper references.  Meaning
-changes because each stage builds up more complex features by
-combining simpler features of the last stage.  The first stage, for
-instance, commonly handles pixel-level features like points, lines or
-edges at a particular direction.  In the final stage, presumably, the
-model has learned complex enough features that things like "kite" and
-"person" can be identified.
-
-The goal in the paper was to find a way to exploit this feature
-hierarchy that is already being computed and to produce something that
-has similar power to a featurized image pyramid but without too high
-of a cost in speed, memory, or complexity.
-
-Everything described so far (none of which is specific to FPNs), the
-paper calls the *bottom-up* pathway - the feed-forward portion of the
-CNN.  FPN adds to this a *top-down* pathway and some lateral
-connections.
-
-** Top-Down Pathway
-
-** Lateral Connections
-
-** As Applied to ResNet
-
-# Note C=256 and such
-
-# TODO: Link to some good explanations
-
-For two reasons, I don't explain much about ResNet here.  The first is
-that residual networks, like the ResNet used here, have seen lots of
-attention and already have many good explanations online.  The second
-is that the paper claims that the underlying network 
-
-[[https://arxiv.org/abs/1512.03385][Deep Residual Learning for Image Recognition]]
-[[https://arxiv.org/abs/1603.05027][Identity Mappings in Deep Residual Networks]]
-
-* Anchors & Region Proposals
-
-Recall last section what was said about feature maps, and the that the
-deeper stages of the CNN happen to be good for classifying images.
-While these deeper stages are lower-resolution than the input images,
-and while their influence is spread out over larger areas of the input
-image (that is, their [[https://en.wikipedia.org/wiki/Receptive_field#In_the_context_of_neural_networks][receptive field]] is rather large due to each
-stage spreading it a little further), the features here still maintain
-a spatial relationship with the input image.  That is, moving across
-one axis of this feature map still corresponds to moving across the
-same axis of the input image.
-
-# Just re-explain the above with the feature pyramid
-
-RetinaNet's design draws heavily from RPNs (Region Proposal Networks)
-here, and here I follow the explanation given in the paper [[https://arxiv.org/abs/1506.01497][Faster
-R-CNN: Towards Real-Time Object Detection with Region Proposal
-Networks]].  I find the explanations in terms of "proposals", of
-focusing the "attention" of the neural network, or of "telling the
-neural network where to look" to be needlessly confusing and
-misleading.  I'd rather explain very plainly how they work.
-
-Central to RPNs is *anchors*.  Anchors aren't exactly a feature of the
-CNN.  They're more a property that's used in its training and
-inference.
-
-In particular:
-
- Say that the feature pyramid has $L$ levels, and that level $l+1$ is
-  half the resolution (thus double the scale) of level $l$.
- Say that level $l$ is a 256-channel feature map of size $W \times H$
-  (i.e. it's a tensor with shape $W \times H \times 256$).  Note that
-  $W$ and $H$ will be larger at lower levels, and smaller at higher
-  levels, but in RetinaNet at least, always 256-channel samples.
- For every point on that feature map (all $WH$ of them), we can
-  identify a corresponding point in the input image.  This is the
-  center point of a broad region of the input image that influences
-  this point in the feature map (i.e. its receptive field).  Note that
-  as we move up to higher levels in the feature pyramid, these regions
-  grow larger, and neighboring points in the feature map correspond to
-  larger and larger jumps across the input image.
- We can make these regions explicit by defining *anchors* - specific
-  rectangular regions associated with each point of a feature map.
-  The size of the anchor depends on the scale of the feature map, or
-  equivalently, what level of the feature map it came from.  All this
-  means is that anchors in level $l+1$ are twice as large as the
-  anchors of level $l$.
-
-The view that this should paint is that a dense collection of anchors
-covers the entire input image at different sizes - still in a very
-ordered pattern, but with lots of overlap.  Remember how I mentioned
-at the beginning of this post that one-stage object detectors use a
-very "brute force" method?
-
-My above explanation glossed over a couple things, but nothing that
-should change the fundamentals.
-
- Anchors are actually associated with every 3x3 window in the anchor
-  map, not precisely every point, but all this really means is that
-  it's "every point and its immediate neighbors" rather than "every
-  point".  This doesn't really matter to anchors, but matters
-  elsewhere.
- It's not a single anchor per 3x3 window, but 9 anchors - one for
-  each of three aspect ratios (1:2, 1:1, and 2:1), and each of three
-  scale factors ($1, 2^{1/3}, and 2^{2/3}$) on top of its base scale.
-  This is just to handle objects of less-square shapes and to cover
-  the gap in scale in between levels of the feature pyramid.  Note
-  that the scale factors are evenly-spaced exponentially, such that an
-  additional step down wouldn't make sense (the largest anchors at the
-  pyramid level /below/ already cover this scale), and nor would an
-  additional step up (the smallest anchors at the pyramid level
-  /above/ already cover it).
-
-Here, finally, is where actual classification and regression come in.
-The *classification subnet* and *box regression subnet* are here.
-
-** Classification Subnet
-
-Every anchor associates an image region with a 3x3 window (i.e. a
-3x3x256 section - it's still 256-channel).  The classification subnet
-is responsible for learning: do the features in this 3x3 window,
-produced from some input, image indicate that an object is inside this
-anchor?  Or, more accurately: For each of $K$ object classes, what's
-the probability of each object (or just of it being background)?
-
-** Box Regression Subnet
-
-The box regression subnet takes the same input as the classification
-subnet, but tries to learn the answer to a different question.  It is
-responsible for learning: what are the coordinates to the object
-inside of this anchor (assuming there is one)?  More specifically, it
-tries to learn to produce 4 numbers values which give offsets relative
-to the anchor's bounds (thus specifying a different region).  Note
-that this subnet completely ignores the class of the object.
-
-The classification subnet already tells us whether or not a given
-anchor contains an object - which already gives rough bounds on
-it. The box regression subnet helps tighten these bounds.
-
-** Other notes (?)
-
-I've glossed over a few details here.  Everything I've described above
-is implemented with bog-standard convolutional networks...
-
-# Parameter sharing? How to explain?
-
-* Training
-
-# Ground-truth object boxes
-# Intersection-over-Union thresholds
-
-* Inference
-
-# Top N results
-
-* References
-
-# Does org-mode have a way to make a special section for references?
-# I know I saw this somewhere
-
-1. [[https://arxiv.org/abs/1708.02002][Focal Loss for Dense Object Detection]]
-2. [[https://arxiv.org/abs/1612.03144][Feature Pyramid Networks for Object Detection]]
-3. [[https://arxiv.org/abs/1506.01497][Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]]
-4. [[https://arxiv.org/abs/1504.08083][Fast R-CNN]]
-5. [[https://arxiv.org/abs/1512.03385][Deep Residual Learning for Image Recognition]]
-6. [[https://arxiv.org/abs/1603.05027][Identity Mappings in Deep Residual Networks]]
-7. [[https://openreview.net/pdf?id%3DSJAr0QFxe][Demystifying ResNet]]
-8. [[https://vision.cornell.edu/se3/wp-content/uploads/2016/10/nips_camera_ready_draft.pdf][Residual Networks Behave Like Ensembles of Relatively Shallow Networks]]
-9. https://github.com/KaimingHe/deep-residual-networks
-10. https://github.com/broadinstitute/keras-resnet (keras-retinanet uses this)
-11. [[https://arxiv.org/abs/1311.2524][Rich feature hierarchies for accurate object detection and semantic segmentation]] (contains the same parametrization as in the Faster R-CNN paper)
-12. http://deeplearning.csail.mit.edu/instance_ross.pdf and http://deeplearning.csail.mit.edu/
--- a/you_still_need_this/drafts/2018-04-08-recommender-systems-1-export.md
+++ b/you_still_need_this/drafts/2018-04-08-recommender-systems-1-export.md
--- a/you_still_need_this/images/2017-12-13-retinanet/1024px-Image_pyramid.svg.png
+++ b/you_still_need_this/images/2017-12-13-retinanet/1024px-Image_pyramid.svg.png
--- a/you_still_need_this/images/2017-12-13-retinanet/2017-12-13-objdet.jpg
+++ b/you_still_need_this/images/2017-12-13-retinanet/2017-12-13-objdet.jpg
--- a/you_still_need_this/images/2017-12-13-retinanet/Typical_cnn.png
+++ b/you_still_need_this/images/2017-12-13-retinanet/Typical_cnn.png
--- a/you_still_need_this/sitebar-bookmarks-201301.org
+++ b/you_still_need_this/sitebar-bookmarks-201301.org