Finally called my modularity post done

2020-07-17 23:21:55 -04:00
parent eaf4f0444a
commit 56a09d4593
2 changed files with 333 additions and 381 deletions
--- a/content/posts/2017-04-20-modularity.org
+++ b/content/posts/2017-04-20-modularity.org
@@ -1,381 +0,0 @@
 ---
 title: Modularity & Abstraction (working title)
 author: Chris Hodapp
 date: April 20, 2017
 tags:
 - technobabble
 - rambling
 draft: true
 ---
 # Why don't I turn this into a paper for arXiv too?  It can still be
 # posted to the blog (just also make it exportable to LaTeX perhaps)
 _Modularity_ and _abstraction_ feature prominently wherever computers
 are involved.  This is meant very broadly: it applies to designing
 software, using software, integrating software, and to a lot of
 hardware as well.  It applies elsewhere, and almost certainly
 originated elsewhere first, however, it appears especially crucial
 around software.
 Definitions, though, are a bit vague (including anything in this
 post).  My goal in this post isn't to try to (re)define them, but to
 explain their essence and expand on a few theses:
 - Modularity arises naturally in a wide array of places.
 - Modularity and abstraction are intrinsically connected.
 - Both are for the benefit of people.  This usually doesn't need
  stated, but to echo Paul Graham and probably others: to the
  computer, it is all the same.
 - More specifically, both are there to manage *complexity* by
  assigning meaningful information and boundaries which allow people
  to match a problem to what they can actually think about.
 # - Whether a given modularization makes sense depends strongly on
 #  meaning and relevance of *information* inside and outside of
 #  modules, and broad context matters to those.
 * Why?
 People generally agree that "modularity" is good.  The idea that
 something complex can be designed and understood in terms of smaller,
 simpler pieces comes naturally to anyone that has built something out
 of smaller pieces or taken something apart.  (This isn't to say that
 reductionism is the best way to understand everything, but that's
 another matter.)  It runs very deep in the Unix philosophy, which ESR
 gives a good overview of in [[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html][The Art of Unix Programming]] - or, listen
 to it from [[https://youtu.be/tc4ROCJYbm0?t%3D248][Kernighan himself]] at Bell Labs in
 1982.
 Tim Berners-Lee gives some practical limitations in [[https://www.w3.org/DesignIssues/Principles.html][Principles of
 Design]] and in [[https://www.w3.org/DesignIssues/Modularity.html][Modularity]]: "Modular design hinges on the simplicity and
 abstract nature of the interface definition between the modules. A
 design in which the insides of each module need to know all about each
 other is not a modular design but an arbitrary partitioning of the
 bits... It is not only necessary to make sure your own system is
 designed to be made of modular parts. It is also necessary to realize
 that your own system, no matter how big and wonderful it seems now,
 should always be designed to be a part of another larger system."  Les
 Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of
 future software]] even did an interesting derivation tying the defect
 density in software to how it is broken into pieces.  The 1972 paper
 [[https://www.cs.virginia.edu/~eos/cs651/papers/parnas72.pdf][On the Criteria to be Used in Decomposing System into Modules]] cites a
 1970 textbook on why modularity is important in systems programming,
 but also notes that nothing is said on how to divide a systems into
 modules.
 "Abstraction" doesn't have quite the same consensus. In software, it's
 generally understood that decoupled or loosely-coupled is better than
 tightly-coupled, but at the same time, "abstraction" can have the
 connotation of something that gets in the way, adds overhead, and
 confuses things.  Dijkstra, in one of few instances of not being
 snarky, allegedly said, "Being abstract is something profoundly
 different from being vague.  The purpose of abstraction is not to be
 vague, but to create a new semantic level in which one can be
 absolutely precise."  Joel Spolsky, in one of few instances of me
 actually caring what he said, also has a blog post from 2002 on the
 [[https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/][Law of Leaky Abstractions]] ("All non-trivial abstractions, to some
 degree, are leaky.")  The [[https://en.wikipedia.org/wiki/Principle_of_least_privilege][principle of least privilege]] is likewise a
 thing. So, abstraction too has its practical and theoretical
 limitations.
 * How They Relate
 I bring these up together because: *abstractions* are the boundaries
 between *modules*, and the communication channels (APIs, languages,
 interfaces, protocols) through which they talk.  It need not
 necessarily be a standardized interface or a well-documented boundary,
 though that helps.
 Available abstractions vary. They vary by, for instance:
 - ...what language you choose.  Consider, for instance, that a language
  like Haskell contains various abstractions done largely within the
  type system that cannot be expressed in many other languages.
  Languages like Python, Ruby, or JavaScript might have various
  abstractions meaningful only in the context of dynamic typing.  Some
  languages more readily permit the creation of new abstractions, and
  this might lead to a broader range of abstractions implemented in
  libraries.
 - ...the operating system and its standard library.  What is a
  process?  What is a thread?  What is a dynamic library?  What is a
  filesystem?  What is a file?  What is a block device?  What is a
  socket?  What is a virtual machine?  What is a bus?  What is a
  commandline?
 - ...the time period.  How many of the abstractions named above were
  around or viable in 1970, 1980, 1990, 2000? In the opposite
  direction, when did you last use that lovely standardized protocol,
  [[https://en.wikipedia.org/wiki/Common_Gateway_Interface][CGI]], to let your web application and your web server communicate,
  use [[https://en.wikipedia.org/wiki/PHIGS][PHIGS]] to render graphics, or access a large multiuser system
  via hard-wired terminals?
 As such: Possible ways to modularize things vary.  It may make no
 sense that certain ways of modularization even can or should exist
 until it's been done other ways hundreds or thousands of times.
 Other terms are related too.  "Loosely-coupled" (or loose coupling)
 and "tightly-coupled" refer to the sort of abstractions sitting
 between modules, or whether or not there even are separate modules.
 "Decoupling" involves changing the relationship between modules
 (sometimes, creating them in the first place), typically splitting
 things into two more sensible pieces that a more sensible abstraction
 separates.  "Factoring out" is really a form of decoupling in which
 smaller parts of something are turned into a module which the original
 thing then interfaces with (one canonical example is taking some bits
 of code, often that are very similar or identical in many places, and
 moving them into a single function).  To say one has "abstracted over"
 some details implies that a module is handling those details, that the
 details shouldn't matter, and what does matter is the abstraction one
 is using.
 One of Rich Hickey's favorite topics is *composition*, and with good
 reason (and you should check out [[http://www.infoq.com/presentations/Simple-Made-Easy/][Simple Made Easy]] regardless).  This
 relates as well: to *compose* things together effectively into bigger
 parts requires that they support some common abstraction.
 In the same area, [[https://clojurefun.wordpress.com/2012/08/17/composition-over-convention/][Composition over convention]] is a good read on how
 /frameworks/ run counter to modularity: they aren't built to behave
 like modules of a larger system.
 # -----
 It has a very pragmatic reason behind it: When something is a module
 unto itself, presumably it is relying on specific abstractions, and it
 is possible to freely change this module's internal details (provided
 that it still respects the same abstractions), to move this module to
 other contexts (anywhere that provides the same abstractions), and to
 replace it with other modules (anything that respects the same
 abstractions).
 It also has a more abstract reason: When something is a module unto
 itself, the way it is designed and implemented usually presents more
 insight into the fundamentals of the problem it is solving. It
 contains fewer incidental details, and more essential details.
 # -------
 * Information
 I referred earlier to the abstractions themselves as both boundaries
 and communications channels.  Another common view is that abstractions
 are *contracts* with a communicated and agreed purpose, and I think
 this is a useful definition too: it conveys the notion that there are
 multiple parties involved and that they are free to behave as needed
 provided that they fulfill some obligation
 Some definitions refer directly to information, like the [[https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming)][abstraction
 principle]] which aims to reduce duplication of information which fits
 with [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][don't repeat yourself]] so that "a modification of any single
 element of a system does not require a change in other logically
 unrelated elements".
 # ----- FIXME
 Consider the information this module deals in, in essence.
 What is the most general form this information could be expressed in,
 without being so general as to encompass other things that are
 irrelevant or so low-level as to needlessly constrain the possible
 contexts?
 (Aristotle's theory of definitions?)
 * Less-Conventional Examples
 One thing I've watched with some interest is when new abstractions
 emerge (or, perhaps, old ones become more widespread) to solve
 problems that I wasn't even aware existed.
 [[https://circleci.com/blog/it-really-is-the-future/][It really is the future]] talks about a lot of more recent forms of
 modularity from the land of devops, most of which were completely
 unheard-of in, say, 2010.  [[https://www.functionalgeekery.com/episode-75-eric-b-merritt/][Functional Geekery episode 75]] talks about
 many similar things.
 [[https://jupyter.org/][Jupyter Notebook]] is one of my favorites here.  It provides a notebook
 interface (similar to something like Maple or Mathematica) which:
 - allows the notebook to use various different programming languages
  underneath,
 - decouples where the notebook is used and where it is running, due to
  being implemented as a web application accessed through the browser,
 - decouples the presentation of a stored notebook from Jupyter itself
  by using a [[https://nbformat.readthedocs.io/en/latest/][JSON-based file format]] which can be rendered without
  Jupyter (like GitHub does if you commit a .ipynb file).
 I love notebook interfaces already because they simplify experimenting
 by handling a lot of things I'd otherwise have to do manually - like
 saving results and keeping them lined up with the exact code that
 produced them.  Jupyter adds some other use-cases I find marvelous -
 for instance, I can let the interpreter run on my workstation which
 has all of the computing power, but I can access it across the
 Internet from my laptop.
 [[https://zeppelin.apache.org/][Apache Zeppelin]] does similar things with different languages; I've
 just used it much less.
 Another favorite of mine is [[https://nixos.org/nix/][Nix]].  One excellent article, [[http://blog.ezyang.com/2014/08/the-fundamental-problem-of-programming-language-package-management/][The
 fundamental problem of programming language package management]],
 doesn't ever mention Nix but explains very well the problems it sets
 out to solve.  To be able to combine nearly all of the
 programming-language specific package managers into a single module is
 a very lofty goal, but Nix appears to do a decent job of it (among
 other things).
 The [[https://www.lua.org/][Lua]] programming language is noteworthy here.  It's written in
 clean C with minimal dependencies, so it runs nearly anywhere that a C
 or C++ compiler targets.  It's purposely very easy both to *embed*
 (i.e. to put inside of a program and use as an extension language,
 such as for plugins or scripting) and to *extend* (i.e. to connect
 with libraries to allow their functionality to be used from Lua).  [[https://www.gnu.org/software/guile/][GNU
 Guile]] has many of the same properties, I'm told.
 We ordinarily think of object systems as something living in the
 programming language.  However, the object system is sometimes made a
 module that is outside of the programming language, and languages just
 interact with it.  [[https://en.wikipedia.org/wiki/GObject][GObject]], [[https://en.wikipedia.org/wiki/Component_Object_Model][COM]], and [[https://en.wikipedia.org/wiki/XPCOM][XPCOM]] do this, and to some
 extent, so does [[https://en.wikipedia.org/wiki/Meta-object_System][Qt & MOC]] - and there are probably hundreds of others,
 particularly if you allow dead ones created during the object-oriented
 hype of the '90s.  This seems to happen in systems where the object
 hierarchy is in effect "bigger" than the language.
 [[https://zeromq.org/][ZeroMQ]] is another example: a set of cross-language abstractions for
 communication patterns in a distributed system.  I know it's likely
 not unique, but it is one of the better-known and the first I thought
 of, and I think their [[http://zguide.zeromq.org/page:all][guide]] is excellent.
 Interestingly, the same iMatix behind ZeroMQ also created [[https://github.com/imatix/gsl][GSL]] and
 explained its value in [[https://imatix-legacy.github.io/mop/introduction.html][Model-Oriented Programming]], for which
 abstraction features heavily.  I've not used GSL, and am skeptical of
 its stated usefulness, but it looks like it is meant to help create
 compile-time abstractions that likewise sit outside of any particular
 programming language.
 # TODO: Expand on this.
 [[https://web.hypothes.is/][hypothes.is]] is a curious one that I find fascinating.  They're trying
 to factor out annotation and commenting from something that is handled
 on a per-webpage basis and turn it into its own module, and I really
 like what I've seen.  However, it does not seem to have caught on
 much.
 The Unix tradition lives on in certain modern tools. [[https://stedolan.github.io/jq/][jq]] has proven
 very useful anytime I've had to mess with JSON data.  [[http://www.dest-unreach.org/socat/][socat]] and [[http://netcat.sourceforge.net/][netcat]]
 have saved me numerous times.  I'm sure certain people love the fact
 that [[https://neovim.io/][Neovim]] is designed to be seamlessly embedded and to extend with
 plugins.  [[https://suckless.org/philosophy][suckless]] perhaps takes it too far, but gets an honorary
 mention...
 # ???
 # Also, TCP/IP and the entire notion of packet-switched networks.
 # And the entire OSI 7-layer model.
 # Also, caches - of all types.  (CPU, disk...)
 # One key is how the above let you *reason* about things without
 # knowing their specifics.
 People know that I love Emacs, but I also do believe many of the
 complaints on how large it is.  Despite that it is basically its own
 operating system, /within this/ it has considerable modularity.  The
 same applies somewhat to Blender, I suppose.
 Consider [[https://research.google.com/pubs/pub43146.html][Machine Learning: The High Interest Credit Card of Technical Debt]],
 a paper that anyone working around machine learning should read and
 re-read regularly.  Large parts of the paper are about ways in which
 machine learning conflicts with proper modularity and abstraction.
 (However, [[https://colah.github.io/posts/2015-09-NN-Types-FP/][Neural Networks, Types, and Functional Programming]] is still
 a good post and shows some sorts of abstraction that still exist
 at least in neural networks.)
 Even DOS had useful abstractions.  Things like
 DriveSpace/DoubleSpace/Stacker worked well enough because most
 software that needed files relied on DOS's normal abstractions to
 access them - so it did not matter to them that the underlying
 filesystem was actually compressed, or was actually a RAM disk, or was
 on some obscure SCSI interface.  Likewise, for the silliness known as
 [[https://en.wikipedia.org/wiki/Expanded_memory][EMS]], applications that accessed memory through the EMS abstraction
 could disregard whether it was a "real" EMS board providing access to
 that memory, whether it was an expanded memory manager providing
 indirect access to some other memory or even to a hard disk pretending
 to be memory.
 Even more abstractly: emulators work because so much software
 respected the abstraction of some specific CPU and hardware platform.
 Submitted without further comment:
 https://github.com/stevemao/left-pad/issues/4
 * Fragments
 - Abstracting over...
  - Multiple applications
  - Multiple users
  - Multiple CPUs
  - Multiple hosts
 - [[Notes - Paper, 2016-11-13]]
 - Tanenbaum vs. Linus war & microkernels
 - TBL: "The choice of language is a common design choice. The low
  power end of the scale is typically simpler to design, implement and
  use, but the high power end of the scale has all the attraction of
  being an open-ended hook into which anything can be placed: a door
  to uses bounded only by the imagination of the programmer.  Computer
  Science in the 1960s to 80s spent a lot of effort making languages
  which were as powerful as possible. Nowadays we have to appreciate
  the reasons for picking not the most powerful solution but the least
  powerful. The reason for this is that the less powerful the
  language, the more you can do with the data stored in that
  language. If you write it in a simple declarative from, anyone can
  write a program to analyze it in many ways."  (Languages are a kind
  of abstraction - one that influences how a module is written, and
  what contexts it is useful in.)
 - "Self" paper & structural reification?
  - I'm still not sure how this relates, but it may perhaps relate to
    how *not* to make things modular (structural reification is a sort
    of check on the scope of objects/classes)
 - What by Rich Hickey?
  - Simple Made Easy?
  - The Value of Values?
 - SICP: [[https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-19.html#%25_chap_3][Modularity, Objects, and State]]
 - [[https://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf][On Understanding Data Abstraction, Revisited]]
 - http://www.catb.org/~esr/writings/taoup/html/apb.html#Baldwin-Clark -
  Carliss Baldwin and Kim Clark. Design Rules, Vol 1: The Power of
  Modularity. 2000. MIT Press. ISBN 0-262-024667.
 - Brooks, No Silver Bullet?
 - https://en.wikipedia.org/wiki/Essential_complexity
 - https://twitter.com/fchollet/status/962074070513631232
 - [[https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-9.html#%25_chap_1][From SICP chapter 1 intro]]: "The acts of the mind, wherein it exerts
  its power over simple ideas, are chiefly these three: 1. Combining
  several simple ideas into one compound one, and thus all complex
  ideas are made. 2. The second is bringing two ideas, whether simple
  or complex, together, and setting them by one another so as to take
  a view of them at once, without uniting them into one, by which it
  gets all its ideas of relations. 3. The third is separating them
  from all other ideas that accompany them in their real existence:
  this is called abstraction, and thus all its general ideas are
  made." -John Locke, An Essay Concerning Human Understanding (1690)
 - One point I have ignored (maybe): You clearly separate the 'inside'
  of a module (its implementation) from the 'outside' (that is - its
  boundaries, the abstractions that it interfaces with or that it
  implements) so that the 'inside' can change more or less freely
  without having any effect on the outside.
 - Abstractions as a way of reducing the work required to add
  functionality (changes can be made just in the relevant modules, and
  other modules do not need to change to conform)
 - What is more key?  Communication, information content, contracts,
  details?
  - [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][Don't repeat yourself]]
 - [[https://simplyphilosophy.org/study/aristotles-definitions/][Aristotle & theory of definitions]]
  - this isn't right.  I need to find the quote in the Durant book
    (which will probably have an actual source) that pertains to how
    specific and how general a definition must be
 - [[https://en.wikipedia.org/wiki/SOLID][SOLID]]
 - [[https://en.wikipedia.org/wiki/Cross-cutting_concern][Cross-cutting concerns]] and [[https://en.wikipedia.org/wiki/Aspect-oriented_programming][Aspect-oriented programming]]
 - [[https://en.wikipedia.org/wiki/Separation_of_concerns][Separation of Concerns]]
 - [[https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming)][Abstraction principle]]
 - [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][Don't repeat yourself]]
--- a/content/posts/2020-07-17-modularity.org
+++ b/content/posts/2020-07-17-modularity.org
@@ -0,0 +1,333 @@
 ---
 title: "Modularity & Abstraction"
 author: Chris Hodapp
 date: "2020-07-16"
 tags:
 - technobabble
 - rambling
 ---
 # Why don't I turn this into a paper for arXiv too?  It can still be
 # posted to the blog (just also make it exportable to LaTeX perhaps)
 /(This is a sort of rambling post that I started in 2017 April.)/
 *Modularity* and *abstraction* feature prominently wherever computers
 are involved.  This is meant very broadly: it applies to designing
 software, using software, integrating software, and to a lot of
 hardware as well.  It applies elsewhere, and almost certainly
 originated elsewhere first, however, it appears especially crucial
 around software.
 Definitions, though, are a bit vague (including anything in this
 post).  My goal in this post isn't to try to (re)define them, but to
 explain their essence and expand on a few theses:
 - Modularity arises naturally in a wide array of places.
 - Modularity and abstraction are intrinsically connected.
 - Both are for the benefit of people.  This usually doesn't need
  stated, but to echo Paul Graham and probably others: to the
  computer, it is all the same.
 - More specifically, both are there to manage *complexity* by
  assigning meaningful information and boundaries which allow people
  to match a problem to what they can actually think about.
 # - Whether a given modularization makes sense depends strongly on
 #  meaning and relevance of *information* inside and outside of
 #  modules, and broad context matters to those.
 * What Are They?
 People generally agree that "modularity" is good.  The idea that
 something complex can be designed and understood in terms of smaller,
 simpler pieces comes naturally to anyone that has built something out
 of smaller pieces or taken something apart.  (This isn't to say that
 reductionism is the best way to understand everything, but that's
 another matter.)  It runs very deep in the Unix philosophy, which ESR
 gives a good overview of in [[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html][The Art of Unix Programming]] - or, listen
 to it from [[https://youtu.be/tc4ROCJYbm0?t%3D248][Kernighan himself]] at Bell Labs in 1982.
 Tim Berners-Lee gives some practical limitations in [[https://www.w3.org/DesignIssues/Principles.html][Principles of
 Design]] and in [[https://www.w3.org/DesignIssues/Modularity.html][Modularity]]: "Modular design hinges on the simplicity and
 abstract nature of the interface definition between the modules. A
 design in which the insides of each module need to know all about each
 other is not a modular design but an arbitrary partitioning of the
 bits... It is not only necessary to make sure your own system is
 designed to be made of modular parts. It is also necessary to realize
 that your own system, no matter how big and wonderful it seems now,
 should always be designed to be a part of another larger system."  Les
 Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of
 future software]] even did an interesting derivation tying the defect
 density in software to how it is broken into pieces.  The 1972 paper
 [[https://www.cs.virginia.edu/~eos/cs651/papers/parnas72.pdf][On the Criteria to be Used in Decomposing System into Modules]] cites a
 1970 textbook on why modularity is important in systems programming,
 but also notes that nothing is said on how to divide a systems into
 modules.
 "Abstraction" doesn't have quite the same consensus. In software, it's
 generally understood that decoupled or loosely-coupled is better than
 tightly-coupled, but at the same time, "abstraction" can have the
 connotation of something that gets in the way, adds overhead, and
 confuses things.  Dijkstra, in one of few instances of not being
 snarky, allegedly said, "Being abstract is something profoundly
 different from being vague.  The purpose of abstraction is not to be
 vague, but to create a new semantic level in which one can be
 absolutely precise."  Joel Spolsky, in one of few instances of me
 actually caring what he said, also has a blog post from 2002 on the
 [[https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/][Law of Leaky Abstractions]] ("All non-trivial abstractions, to some
 degree, are leaky.")  The [[https://en.wikipedia.org/wiki/Principle_of_least_privilege][principle of least privilege]] is likewise a
 thing. So, abstraction too has its practical and theoretical
 limitations.
 * How They Relate
 I bring these up together because: *abstractions* are the boundaries
 between *modules*, and the communication channels (APIs, languages,
 interfaces, protocols) through which they talk.  It need not
 necessarily be a standardized interface or a well-documented boundary,
 though that helps.
 Available abstractions vary. They vary by, for instance:
 - ...what language you choose.  Consider, for instance, that a language
  like Haskell contains various abstractions done largely within the
  type system that cannot be expressed in many other languages.
  Languages like Python, Ruby, or JavaScript might have various
  abstractions meaningful only in the context of dynamic typing.  Some
  languages more readily permit the creation of new abstractions, and
  this might lead to a broader range of abstractions implemented in
  libraries.
 - ...the operating system and its standard library.  What is a
  process?  What is a thread?  What is a dynamic library?  What is a
  filesystem?  What is a file?  What is a block device?  What is a
  socket?  What is a virtual machine?  What is a bus?  What is a
  commandline?
 - ...all other kinds of libraries a language might use, and entire
  frameworks that cross language boundaries.  Consider something like
  Apache Spark, which deals in abstractions that may be accessed from
  various languages.
 - ...the time period.  How many of the abstractions named above were
  around or viable in 1970, 1980, 1990, 2000? In the opposite
  direction, when did you last use that lovely standardized protocol,
  [[https://en.wikipedia.org/wiki/Common_Gateway_Interface][CGI]], to let your web application and your web server communicate,
  use [[https://en.wikipedia.org/wiki/PHIGS][PHIGS]] to render graphics, or access a large multiuser system
  via hard-wired terminals?
 As such: Possible ways to modularize things vary.  It may make no
 sense that certain ways of modularization even can or should exist
 until it's been done other ways dozens or hundreds or maybe thousands
 of times.
 Other terms are related too.  "Loosely-coupled" (or loose coupling)
 and "tightly-coupled" refer to the sort of abstractions sitting
 between modules, or whether or not there even are separate modules.
 "Decoupling" involves changing the relationship between modules
 (sometimes, creating them in the first place), typically splitting
 things into two more sensible pieces that a more sensible abstraction
 separates.  "Factoring out" is really a form of decoupling in which
 smaller parts of something are turned into a module which the original
 thing then interfaces with (one canonical example is taking some bits
 of code, often that are very similar or identical in many places, and
 moving them into a single function).  To say one has "abstracted over"
 some details implies that a module is handling those details, that the
 details shouldn't matter, and what does matter is the abstraction one
 is using.
 One of Rich Hickey's favorite topics is *composition*, and with good
 reason (and you should check out [[http://www.infoq.com/presentations/Simple-Made-Easy/][Simple Made Easy]] regardless).  This
 relates as well: to *compose* things together effectively into bigger
 parts requires that they support some common abstraction.
 In the same area, [[https://clojurefun.wordpress.com/2012/08/17/composition-over-convention/][Composition over convention]] is a good read on how
 /frameworks/ run counter to modularity: they aren't built to behave
 like modules of a larger system.
 The contrasting terms *interface* and *implementation* are commonly
 seen in software, with "implementation" loosely referring to what is
 inside a module, and "interface" referring to its "outside" boundaries
 and thus to the abstractions it supports.  You'll commonly hear advice
 about separating interface from implementation, and some semi-related
 things:
 - [[https://en.wikipedia.org/wiki/SOLID][SOLID]]
 - [[https://en.wikipedia.org/wiki/Cross-cutting_concern][Cross-cutting concerns]] and [[https://en.wikipedia.org/wiki/Aspect-oriented_programming][Aspect-oriented programming]]
 - [[https://en.wikipedia.org/wiki/Separation_of_concerns][Separation of Concerns]]
 - [[https://en.wikipedia.org/wiki/Information_hiding][Information hiding]] and [[https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)][encapsulation]]
 * Why?
 It has a very pragmatic reason behind it: When something is a module
 unto itself, presumably it is relying on specific abstractions, and it
 is possible to freely change this module's internal details (provided
 that it still respects the same abstractions), to move this module to
 other contexts (anywhere that provides the same abstractions), and to
 replace it with other modules (anything that respects the same
 abstractions).
 It also has a more abstract reason: When something is a module unto
 itself, the way it is designed and implemented usually presents more
 insight into the fundamentals of the problem it is solving. It
 contains fewer incidental details, and more essential details.
 That's all very practical for people. It reduces the amount of
 information that they must handle, and it permits them to *reason*
 about the behavior of systems that are unknown or even completely
 hypothetical.
 It can also be seen as serving as a *contract* which reduces the
 amount of communication and often the amount of disagreement.  I think
 this is a useful definition too: it conveys the notion that there are
 multiple parties involved, that they have already agreed on some
 specific obligations, and that they are free to behave as needed
 provided that they fulfill those obligations.
 [[https://en.wikipedia.org/wiki/Separation_of_concerns][Separation of Concerns]] gets at this same idea and expresses it in
 terms of "concerns" rather than contracts.
 I referred earlier to the abstractions themselves as both boundaries
 and communications channels, and invoking "communications" raises the
 related question of what *information* is being communicated.  (For
 whatever reason, Wikipedia defines a [[https://en.wikipedia.org/wiki/Concern_(computer_science)][concern]] in terms
 of... information).
 Some definitions refer directly to information, like the
 [[https://en.wikipedia.org/wiki/Abstraction_principle_(computer_programming)][abstraction principle]] which aims to reduce duplication of information
 which fits with [[https://en.wikipedia.org/wiki/Don%2527t_repeat_yourself][don't repeat yourself]] so that "a modification of any
 single element of a system does not require a change in other
 logically unrelated elements". [[https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)][Encapsulation]] likewise refers to it
 via [[https://en.wikipedia.org/wiki/Information_hiding][information hiding]].  Alan Perlis in his [[http://www.cs.yale.edu/homes/perlis-alan/quotes.html][epigrams]] had #20:
 "Wherever there is modularity there is the potential for
 misunderstanding: Hiding information implies a need to check
 communication."
 * Examples
 Network stacks, in particular via the OSI 7-layer model, are a good
 example of all of this. Higher-level protocols can work in a way that
 disregards lower-level details (most of the time - matters of
 bandwidth and latency do sometimes matter). Lower-level protocols can
 advance and be replaced without much concern for their higher-level
 use.
 Even the early innovation of packet-switching is a great instance of
 abstracting network and routing details away from communications
 Disk caches, and memory caches, and most other kinds of caches, work
 because they still implement the same underlying abstraction (albeit
 with some minor leakage).
 Even DOS had useful abstractions.  Things like
 [[https://en.wikipedia.org/wiki/DriveSpace][DriveSpace/DoubleSpace]]/Stacker worked well enough because most
 software that needed files relied on DOS's normal abstractions to
 access them - so it did not matter to them that the underlying
 filesystem was actually compressed, or was actually a RAM disk, or was
 on some obscure SCSI interface.  Likewise, for the silliness known as
 [[https://en.wikipedia.org/wiki/Expanded_memory][EMS]], applications that accessed memory through the EMS abstraction
 could disregard whether it was a "real" EMS board providing access to
 that memory, whether it was an expanded memory manager providing
 indirect access to some other memory or even to a hard disk pretending
 to be memory.
 ** Less-Conventional Examples
 One thing I've watched with some interest is when new abstractions
 emerge (or, perhaps, old ones become more widespread) to solve
 problems that I wasn't even aware existed.
 [[https://circleci.com/blog/it-really-is-the-future/][It really is the future]] talks about a lot of more recent forms of
 modularity from the land of devops, most of which were completely
 unheard-of in, say, 2010.  [[https://www.functionalgeekery.com/episode-75-eric-b-merritt/][Functional Geekery episode 75]] talks about
 many similar things.
 [[https://jupyter.org/][Jupyter Notebook]] is one of my favorites here.  It provides a notebook
 interface (similar to something like Maple or Mathematica) which:
 - allows the notebook to use various different programming languages
   underneath,
 - decouples where the notebook is used and where it is running, due to
   being implemented as a web application accessed through the browser,
 - decouples the presentation of a stored notebook from Jupyter itself
   by using a [[https://nbformat.readthedocs.io/en/latest/][JSON-based file format]] which can be rendered without
   Jupyter (like GitHub does if you commit a .ipynb file).
 I love notebook interfaces already because they simplify experimenting
 by handling a lot of things I'd otherwise have to do manually - like
 saving results and keeping them lined up with the exact code that
 produced them.  Jupyter adds some other use-cases I find marvelous -
 for instance, I can let the interpreter run on my workstation which
 has all of the computing power, but I can access it across the
 Internet from my laptop.
 [[https://zeppelin.apache.org/][Apache Zeppelin]] does similar things with different languages; I've
 just used it much less.
 Another favorite of mine is [[https://nixos.org/nix/][Nix]] (likewise its cousin [[https://guix.gnu.org/][Guix]]).
 One excellent article,
 [[http://blog.ezyang.com/2014/08/the-fundamental-problem-of-programming-language-package-management/][The fundamental problem of programming language package management]],
 doesn't ever mention Nix but explains very well the problems it sets
 out to solve.  To be able to combine nearly all of the
 programming-language specific package managers into a single module is
 a very lofty goal, but Nix appears to do a decent job of it (among
 other things).
 The [[https://www.lua.org/][Lua]] programming language is noteworthy here.  It's written in
 clean C with minimal dependencies, so it runs nearly anywhere that a C
 or C++ compiler targets.  It's purposely very easy both to *embed*
 (i.e. to put inside of a program and use as an extension language,
 such as for plugins or scripting) and to *extend* (i.e. to connect
 with libraries to allow their functionality to be used from Lua).  [[https://www.gnu.org/software/guile/][GNU
 Guile]] has many of the same properties, I'm told.
 We ordinarily think of object systems as something living in the
 programming language.  However, the object system is sometimes made a
 module that is outside of the programming language, and languages just
 interact with it.  [[https://en.wikipedia.org/wiki/GObject][GObject]], [[https://en.wikipedia.org/wiki/Component_Object_Model][COM]], and [[https://en.wikipedia.org/wiki/XPCOM][XPCOM]] do this, and to some
 extent, so does [[https://en.wikipedia.org/wiki/Meta-object_System][Qt & MOC]] - and there are probably hundreds of others,
 particularly if you allow dead ones created during the object-oriented
 hype of the '90s.  This seems to happen in systems where the object
 hierarchy is in effect "bigger" than the language.
 [[https://zeromq.org/][ZeroMQ]] is another example: a set of cross-language abstractions for
 communication patterns in a distributed system.  I know it's likely
 not unique, but it is one of the better-known and the first I thought
 of, and I think their [[http://zguide.zeromq.org/page:all][guide]] is excellent.
 Interestingly, the same iMatix behind ZeroMQ also created [[https://github.com/imatix/gsl][GSL]] and
 explained its value in [[https://imatix-legacy.github.io/mop/introduction.html][Model-Oriented Programming]], for which
 abstraction features heavily.  I've not used GSL, and am skeptical of
 its stated usefulness, but it looks like it is meant to help create
 compile-time abstractions that likewise sit outside of any particular
 programming language.
 # TODO: Expand on this.
 [[https://web.hypothes.is/][hypothes.is]] is a curious one that I find fascinating.  They're trying
 to factor out annotation and commenting from something that is handled
 on a per-webpage basis and turn it into its own module, and I really
 like what I've seen.  However, it does not seem to have caught on
 much.
 The Unix tradition lives on in certain modern tools. [[https://stedolan.github.io/jq/][jq]] has proven
 very useful anytime I've had to mess with JSON data.  [[http://www.dest-unreach.org/socat/][socat]] and [[http://netcat.sourceforge.net/][netcat]]
 have saved me numerous times.  I'm sure certain people love the fact
 that [[https://neovim.io/][Neovim]] is designed to be seamlessly embedded and to extend with
 plugins.  [[https://suckless.org/philosophy][suckless]] perhaps takes it too far, but gets an honorary
 mention...
 People know that I love Emacs, but I also do believe many of the
 complaints on how large it is.  Despite that it is basically its own
 operating system, /within this/ it has considerable modularity.  The
 same applies somewhat to Blender, I suppose.
 Consider [[https://research.google.com/pubs/pub43146.html][Machine Learning: The High Interest Credit Card of Technical Debt]],
 a paper that anyone working around machine learning should read and
 re-read regularly.  Large parts of the paper are about ways in which
 machine learning conflicts with proper modularity and abstraction.
 (However, [[https://colah.github.io/posts/2015-09-NN-Types-FP/][Neural Networks, Types, and Functional Programming]] is still
 a good post and shows some sorts of abstraction that still exist
 at least in neural networks.)
 Even more abstractly: emulators work because so much software
 respected the abstraction of some specific CPU and hardware platform.
 Submitted without further comment:
 https://github.com/stevemao/left-pad/issues/4