blag/drafts/2017-04-20-modularity.org

#+TITLE: Modularity & Abstraction (working title)
#+AUTHOR: Chris Hodapp
#+DATE: April 20, 2017
#+TAGS: technobabble

# Why don't I turn this into a paper for arXiv too?  It can still be
# posted to the blog (just also make it exportable to LaTeX perhaps)

_Modularity_ and _abstraction_ feature prominently wherever computers
are involved.  This is meant very broadly: it applies to designing
software, using software, integrating software, and to a lot of
hardware as well.  It applies elsewhere, and almost certainly
originated there first, however, it appears to be particularly
crucial around software.

Definitions, though, are a bit vague (including anything in this
post).  My goal in this post isn't to try to (re)define them, but to
explain a bit of their essence, and expand on a few theses:

- Modularity arises naturally in a wide array of places.
- Modularity and abstraction are intrinsically connected.
- Whether a given modularization makes sense depends strongly on
  meaning and relevance of *information* inside and outside of
  modules, and broad context matters to those.

* Why?

People generally agree that "modularity" is good.  The idea that
something complex can be designed and understood in terms of smaller,
simpler pieces comes naturally to anyone that has built something out
of smaller pieces or taken something apart.  It runs very deep in the
Unix philosophy, which ESR gives a good overview of in [[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html][The Art of Unix
Programming]] (or, listen to it from [[https://youtu.be/tc4ROCJYbm0?t%3D248][Kernighan himself]] at Bell Labs in
1982.)

Tim Berners-Lee gives some practical limitations in [[https://www.w3.org/DesignIssues/Principles.html][Principles of
Design]] and in [[https://www.w3.org/DesignIssues/Modularity.html][Modularity]]: "Modular design hinges on the simplicity and
abstract nature of the interface definition between the modules. A
design in which the insides of each module need to know all about each
other is not a modular design but an arbitrary partitioning of the
bits... It is not only necessary to make sure your own system is
designed to be made of modular parts. It is also necessary to realize
that your own system, no matter how big and wonderful it seems now,
should always be designed to be a part of another larger system."  Les
Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of
future software]] even did an interesting derivation tying the defect
density in software to how it is broken into pieces.  The 1972 paper
[[https://www.cs.virginia.edu/~eos/cs651/papers/parnas72.pdf][On the Criteria to be Used in Decomposing System into Modules]] cites a
1970 textbook on why modularity is important in systems programming,
but also notes that nothing is said on how to divide a systems into
modules.

"Abstraction" doesn't have quite the same consensus. In software, it's
generally understood that decoupled or loosely-coupled is better than
tightly-coupled, but at the same time, "abstraction" can have the
connotation of something that gets in the way, adds overhead, and
confuses things.  Dijkstra, in one of few instances of not being
snarky, allegedly said, "Being abstract is something profoundly
different from being vague.  The purpose of abstraction is not to be
vague, but to create a new semantic level in which one can be
absolutely precise."  Joel Spolsky, in one of few instances of me
actually caring what he said, also has a blog post from 2002 on the
[[https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/][Law of Leaky Abstractions]].  The [[https://en.wikipedia.org/wiki/Principle_of_least_privilege][principle of least privilege]] is
likewise a thing. So, abstraction too has its practical and
theoretical limitations.

* How They Relate

I bring these up together because: *abstractions* are the boundaries
between *modules*, and the communication channels (APIs, languages,
interfaces, protocols) through which they talk.  It need not
necessarily be a standardized interface or a well-documented boundary,
though that helps.

Available abstractions vary. They vary by, for instance:
- ...what language you choose.  Consider, for instance, that a language
  like Haskell contains various abstractions done largely within the
  type system that cannot be expressed in many other languages.
  Languages like Python, Ruby, or JavaScript might have various
  abstractions meaningful only in the context of dynamic typing.  Some
  languages more readily permit the creation of new abstractions, and
  this might lead to a broader range of abstractions implemented in
  libraries.
- ...the operating system and its standard library.  What is a
  process?  What is a thread?  What is a dynamic library?  What is a
  filesystem?  What is a file?  What is a block device?  What is a
  socket?  What is a virtual machine?  What is a bus?  What is a
  commandline?
- ...the time period.  How many of the abstractions named above were
  around or viable in 1970, 1980, 1990, 2000? In the opposite
  direction, when did you last use that lovely standardized protocol,
  [[https://en.wikipedia.org/wiki/Common_Gateway_Interface][CGI]], to let your web application and your web server communicate,
  use [[https://en.wikipedia.org/wiki/PHIGS][PHIGS]] to render graphics, or access a large multiuser system
  via hard-wired terminals?

As such: Possible ways to modularize things vary.  It may make no
sense that certain ways of modularization even can or should exist
until it's been done other ways hundreds or thousands of times.

Other terms are related too.  "Loosely-coupled" (or loose coupling)
and "tightly-coupled" refer to the sort of abstractions sitting
between modules, or whether or not there even are separate modules.
"Decoupling" involves changing the relationship between modules
(sometimes, creating them in the first place), typically splitting
things into two more sensible pieces that a more sensible abstraction
separates.  "Factoring out" is really a form of decoupling in which
smaller parts of something are turned into a module which the original
thing then interfaces with (one canonical example is taking some bits
of code, often that are very similar or identical in many places, and
moving them into a single function).  To say one has "abstracted over"
some details implies that a module is handling those details, that the
details shouldn't matter, and what does matter is the abstraction one
is using.

# -----
Consider the information this module deals in, in essence.

What is the most general form this information could be expressed in,
without being so general as to encompass other things that are
irrelevant or so low-level as to needlessly constrain the possible
contexts?

(Aristotle's theory of definitions?)

# -----

In a practical sense: Where someone "factors out" something that
occurs in similar or identical form in multiple places (incidentally,
"decouples" also works fine as a term), they're often creating a
module (from what was factored out) and some number of abstractions
(from the break that created).  Consider some examples:
- Some configurable functionality in a larger application is extracted
  out into a system of plugins.  The details of the application are
  abstracted over (as far as the plugin cares), and the details of the
  plugin are abstracted over (as far as the application cares).  The
  API that the application and plugins use to communicate is the new
  abstraction now available.  The plugins are modules, and the
  application itself is a module of a different sort.  (Witness that
  sometimes another application will implement the same plugin API.)

It has a very pragmatic reason behind it: When something is a module
unto itself, presumably it is relying on specific abstractions, and it
is possible to freely change this module's internal details (provided
that it still handles the same abstractions), to move this module to
other contexts (anything providing the same abstractions), to replace
it with other modules (anything using the same abstractions).

It also has a more abstract reason: When something is a module unto
itself, the way it is designed and implemented often presents more
insight into the fundamentals of the problem it is solving. It
contains fewer incidental details, and more essential details.

# -------

* Less-Conventional Examples

One thing I've watched with some interest is when new abstractions
emerge (or, perhaps, old ones become more widespread) to solve
problems that I wasn't even aware existed.

[[https://circleci.com/blog/it-really-is-the-future/][It really is the future]] talks about a lot of more recent forms of
modularity, most of which are beyond me and were completely unheard-of
in, say, 2010.  [[https://www.functionalgeekery.com/episode-75-eric-b-merritt/][Functional Geekery episode 75]] talks about many similar
things.

[[https://jupyter.org/][Jupyter Notebook]] is one of my favorites here.  It provides a notebook
interface (similar to something like Maple or Mathematica) which:

- allows the notebook to use various different programming languages
  underneath,
- decouples where the notebook is used and where it is running, due to
  being implemented as a web application accessed through the browser,
- decouples the presentation of a stored notebook from Jupyter itself
  by using a [[https://nbformat.readthedocs.io/en/latest/][JSON-based file format]] which can be rendered without
  Jupyter (like GitHub does if you commit a .ipynb file).

I love notebook interfaces already because they simplify experimenting
by handling a lot of things I'd otherwise have to do manually - like
saving results and keeping them lined up with the exact code that
produced them.  Jupyter adds some other use-cases I find marvelous -
for instance, I can let the interpreter run on my much faster
workstation, but I can access it across the Internet from my much
slower laptop.

[[https://zeppelin.apache.org/][Apache Zeppelin]] does similar things with different languages; I just
use it less.

Another favorite of mine is [[https://nixos.org/nix/][Nix]].  One excellent article, [[http://blog.ezyang.com/2014/08/the-fundamental-problem-of-programming-language-package-management/][The
fundamental problem of programming language package management]],
doesn't ever mention Nix but does a great job explaining the sorts of
problems it exists to solve.  To be able to combine nearly all of the
programming-language specific package managers into a single module is
a very lofty goal, but Nix appears to do a decent job of it.

The [[https://www.lua.org/][Lua]] programming language is noteworthy here.  It's written in
clean C with minimal dependencies, so it runs nearly anywhere that a C
or C++ compiler targets.  It's purposely very easy both to *embed*
(i.e. to put inside of a program and use as an extension language,
such as for plugins or scripting) and to *extend* (i.e. to connect
with libraries to allow their functionality to be used from Lua).  [[https://www.gnu.org/software/guile/][GNU
Guile]] has many of the same properties, I'm told.

We ordinarily think of object systems as something living in the
programming language.  However, the object system is sometimes made a
module that is outside of the programming language, and languages just
interact with it.  [[https://en.wikipedia.org/wiki/GObject][GObject]], [[https://en.wikipedia.org/wiki/Component_Object_Model][COM]], and [[https://en.wikipedia.org/wiki/XPCOM][XPCOM]] do this, and to some
extent, so does [[https://en.wikipedia.org/wiki/Meta-object_System][Qt & MOC]] - and there are probably hundreds of others,
particularly if you allow dead ones created during the object-oriented
hype of the '90s.  This seems to happen in systems where the object
hierarchy is in effect "bigger" than the language.

ZeroMQ is also notable here (and I know it's likely not unique, but it
is one of the better-known and the first I thought of) as a set of
cross-language abstractions for communication patterns.

Interestingly, the same iMatix behind ZeroMQ also created [[https://github.com/imatix/gsl][GSL]] and
explained its value in [[https://imatix-legacy.github.io/mop/introduction.html][Model-Oriented Programming]], for which
abstraction features heavily.  I've not used GSL, and am skeptical of
its stated usefulness, but it looks like it is meant to help create
compile-time abstractions that likewise sit outside of any particular
programming language.

# TODO: Expand on this.

[[https://web.hypothes.is/][hypothes.is]] is a curious one that I find fascinating.  They're trying
to factor out annotation and commenting from something that is handled
on a per-webpage basis and turn it into its own module, and I really
like what I've seen.

The Unix tradition lives on in certain modern tools. [[https://stedolan.github.io/jq/][jq]] has proven
very useful anytime I've had to mess with JSON data.  [[http://www.dest-unreach.org/socat/][socat]] and [[http://netcat.sourceforge.net/][netcat]]
have saved me numerous times.  I'm sure certain people love the fact
that [[https://neovim.io/][Neovim]] is designed to be seamlessly embedded and to extend with
plugins.  [[https://suckless.org/philosophy][suckless]] perhaps takes it too far, but gets an honorary
mention...

# ???

# Also, TCP/IP and the entire notion of packet-switched networks.
# And the entire OSI 7-layer model.

# Also, caches - of all types.  (CPU, disk...)

People know that I love Emacs, but I also do believe many of the
complaints on how large it is.  On the one hand, it is basically its
own operating system and /within this/ it has considerable modularity.
On the other hand, I already have a perfectly usable operating system
underneath, and it can make SSH tunnels instead of requiring that my
editor have [[https://www.gnu.org/software/tramp/][its own explicit support]] for them.

Consider [[https://research.google.com/pubs/pub43146.html][Machine Learning: The High Interest Credit Card of Technical Debt]],
a paper that anyone working around machine learning should read and
re-read regularly.  Large parts of the paper are about ways in which
machine learning conflicts with proper modularity and abstraction.
(However, [[https://colah.github.io/posts/2015-09-NN-Types-FP/][Neural Networks, Types, and Functional Programming]] is still
a good post and shows some sorts of abstraction that still exist
at least in neural networks.)

[[https://clojurefun.wordpress.com/2012/08/17/composition-over-convention/][Composition over convention]] is an important read on why /frameworks/
can also run completely counter to modularity.

Submitted without further comment:
https://github.com/stevemao/left-pad/issues/4

* Fragments

- Abstracting over...
  - Multiple applications
  - Multiple users
  - Multiple CPUs
  - Multiple hosts

- [[Notes - Paper, 2016-11-13]]
- Any Plan 9 papers? (Will have to dig deep in the archives)
  - http://plan9.bell-labs.com/sys/doc/
  - Link is now down
- Tanenbaum vs. Linus war & microkernels
- TBL: "The choice of language is a common design choice. The low
  power end of the scale is typically simpler to design, implement and
  use, but the high power end of the scale has all the attraction of
  being an open-ended hook into which anything can be placed: a door
  to uses bounded only by the imagination of the programmer.  Computer
  Science in the 1960s to 80s spent a lot of effort making languages
  which were as powerful as possible. Nowadays we have to appreciate
  the reasons for picking not the most powerful solution but the least
  powerful. The reason for this is that the less powerful the
  language, the more you can do with the data stored in that
  language. If you write it in a simple declarative from, anyone can
  write a program to analyze it in many ways."  (Languages are a kind
  of abstraction - one that influences how a module is written, and
  what contexts it is useful in.)
- "Self" paper & structural reification?
  - I'm still not sure how this relates, but it may perhaps relate to
    how *not* to make things modular (structural reification is a sort
    of check on the scope of objects/classes)
- What by Rich Hickey?
  - Simple Made Easy?
  - The Value of Values?
- SICP: [[https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-19.html#%25_chap_3][Modularity, Objects, and State]]
- [[https://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf][On Understanding Data Abstraction, Revisited]]
- http://www.catb.org/~esr/writings/taoup/html/apb.html#Baldwin-Clark -
  Carliss Baldwin and Kim Clark. Design Rules, Vol 1: The Power of
  Modularity. 2000. MIT Press. ISBN 0-262-024667.
- Brooks, No Silver Bullet?

- https://en.wikipedia.org/wiki/Essential_complexity

- https://twitter.com/fchollet/status/962074070513631232

- How does this fit with /composition/? Does it?
  - The ability to sensibly compose things depends on them having some
    sort of well-defined, compatible boundary - right?
  - Note also /decomposition/ here, as in /decomposing/ something into
    parts.
- [[https://en.wikipedia.org/wiki/Cross-cutting_concern][Cross-cutting concerns]], [[https://en.wikipedia.org/wiki/Aspect-oriented_programming][aspect-oriented programming]] (as an attempt
  to take tangled things and pull them into modules)
  - [[https://en.wikipedia.org/wiki/Separation_of_concerns][Separation of Concerns]]
- Abstraction as an information channel... module as a what?
- Even in DOS days, simple abstractions mattered like making something
  behave like a hard drive or like a filesystem in DOS.  Things like
  DriveSpace/DoubleSpace/Stacker worked well enough because most
  applications were written to respect DOS's file access calls.
  Things like HIMEM, EMM386, and QEMM worked reasonably well because
  applications were written to respect DOS's dumpster fire of memory
  management that I am eternally lucky to never have to touch again.
- One point I have ignored (maybe): You clearly separate the 'inside'
  of a module (its implementation) from the 'outside' (that is - its
  boundaries, the abstractions that it interfaces with or that it
  implements) so that the 'inside' can change more or less freely
  without having any effect on the outside.
- Abstractions as _contracts_ with a communicated/agreed purpose
- Abstractions as a way of reducing the work required to add
  functionality (changes can be made just in the relevant modules, and
  other modules do not need to change to conform)
- What is more key?  Communication, information content, contracts,
  details?
- [[https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming)][Abstraction principle]]
  - Reduce duplication of information
  - [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][Don't repeat yourself]]
- [[https://simplyphilosophy.org/study/aristotles-definitions/][Aristotle & theory of definitions]]
  - this isn't right.  I need to find the quote in the Durant book
    (which will probably have an actual source) that pertains to how
    specific and how general a definition must be