Various updates to modularity draft

This commit is contained in:
Chris Hodapp 2017-12-13 21:16:04 -05:00
parent 60cc97f219
commit a14630bfda

View File

@ -3,63 +3,118 @@
#+DATE: April 20, 2017
#+TAGS: technobabble
Two central concepts that feature prominently anywhere that computers
do are _modularity_ and _abstraction_. This is meant very broadly: it
applies to designing software, using software, integrating software,
and to a lot of hardware as well. It certainly applies elsewhere too,
particularly other fields of engineering, but it appears to be
particularly crucial anywhere software is involved.
_Modularity_ and _abstraction_ feature prominently wherever computers
are involved. This is meant very broadly: it applies to designing
software, using software, integrating software, and to a lot of
hardware as well. It applies elsewhere, and almost certainly
originated there first, however, it appears to be particularly
crucial around software.
They're generally accepted as desireable, but a bit ill-understood at
times. It's common to find people who treat "abstraction" as
something that always stands in their way, adds overhead, and confuses
things. At the same time, it's common to find people who treat
modularity as being present anytime something is broken into pieces.
Definitions, though, are a bit vague (including anything in this
post). My goal in this post isn't to try to (re)define them, but to
explain a bit of their essence, and expand on a few theses:
"Being abstract is something profoundly different from being vague.
The purpose of abstraction is not to be vague, but to create a new
semantic level in which one can be absolutely precise." E. W. Dijkstra
- Modularity arises naturally in a wide array of places.
- Modularity and abstraction are intrinsically connected.
- Whether a given modularization makes sense depends strongly on
meaning and relevance of *information* inside and outside of
modules, and broad context matters to those.
"Modular design hinges on the simplicity and abstract nature of the
interface definition between the modules. A design in which the
insides of each module need to know all about each other is not a
modular design but an arbitrary partitioning of the bits." (Tim
Berners-Lee in [[https://www.w3.org/DesignIssues/Principles.html][Principles of Design]].)
* Why?
"Its is not only necessary to make sure your own system is designed to
be made of modular parts. It is also necessary to realize that your
own system, no matter how big and wonderful it seems now, should
always be designed to be a part of another larger system." (Same)
People generally agree that "modularity" is good. The idea that
something complex can be designed and understood in terms of smaller,
simpler pieces comes naturally to anyone that has built something out
of smaller pieces or taken something apart. It runs very deep in the
Unix philosophy, which ESR gives a good overview of in [[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html][The Art of Unix
Programming]] (or, listen to it from [[https://youtu.be/tc4ROCJYbm0?t%3D248][Kernighan himself]] at Bell Labs in
1982.)
*Abstraction* and *modularity* are tied other inextricably. This is
because abstractions draw out the boundaries that modules sit inside,
and the interfaces (APIs, communication channels) through which they
talk. It need not necessarily be a standardized interface or a
well-documented boundary, though that helps.
Tim Berners-Lee gives some practical limitations in [[https://www.w3.org/DesignIssues/Principles.html][Principles of
Design]] and in [[https://www.w3.org/DesignIssues/Modularity.html][Modularity]]: "Modular design hinges on the simplicity and
abstract nature of the interface definition between the modules. A
design in which the insides of each module need to know all about each
other is not a modular design but an arbitrary partitioning of the
bits... It is not only necessary to make sure your own system is
designed to be made of modular parts. It is also necessary to realize
that your own system, no matter how big and wonderful it seems now,
should always be designed to be a part of another larger system." Les
Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of
future software]] even did an interesting derivation tying the defect
density in software to how it is broken into pieces.
*Decoupling*, *loose coupling*, and *tight coupling* relate to this as
well. When two things are decoupled, it often suggests something
about at least one of them being a module.
"Abstraction" doesn't have quite the same consensus. In software, it's
generally understood that decoupled or loosely-coupled is better than
tightly-coupled, but at the same time, "abstraction" can have the
connotation of something that gets in the way, adds overhead, and
confuses things. Dijkstra, in one of few instances of not being
snarky, allegedly said, "Being abstract is something profoundly
different from being vague. The purpose of abstraction is not to be
vague, but to create a new semantic level in which one can be
absolutely precise." Joel Spolsky, in one of few instances of me
actually caring what he said, also has a blog post from 2002 on the
[[https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/][Law of Leaky Abstractions]]. The [[https://en.wikipedia.org/wiki/Principle_of_least_privilege][principle of least privilege]] is
likewise a thing. So, abstraction too has its practical and
theoretical limitations.
Available abstractions vary. The language itself may set boundaries
on what abstractions can be created, or practically created (a
language like Haskell contains various abstractions done largely
within the type system that cannot be expressed in many others;
languages like Python, Ruby, or JavaScript might have various
abstractions that are meaningful only in the context of dynamic
typing). Some languages more readily permit the creation of new
abstractions, and this might lead to a broader range of abstractions
implemented in libraries. The operating system and standard library
provide abstractions that may cross languages (what is a process?
what is a thread? what is a library? what is a filesystem?). This
extends into electronics (where agreed-upon protocols permit
interconnectivity and an ignorance of internals that is required for
abstraction) where "modules" take on a tangible form. Abstractions
also change over time - both in specifics (consult any list of dead
protocols and technologies) and in broader classes (consider how we
now have entire abstractions devoted to provisioning virtual
resources, and consider how CGI and FastCGI used to be pretty
widely-used interfaces).
* How They Relate
I bring these up together because: *abstractions* are the boundaries
between *modules*, and the communication channels (APIs, languages,
interfaces, protocols) through which they talk. It need not
necessarily be a standardized interface or a well-documented boundary,
though that helps.
Available abstractions vary. They vary by, for instance:
- ...what language you choose. Consider, for instance, that a language
like Haskell contains various abstractions done largely within the
type system that cannot be expressed in many other languages.
Languages like Python, Ruby, or JavaScript might have various
abstractions meaningful only in the context of dynamic typing. Some
languages more readily permit the creation of new abstractions, and
this might lead to a broader range of abstractions implemented in
libraries.
- ...the operating system and its standard library. What is a
process? What is a thread? What is a dynamic library? What is a
filesystem? What is a file? What is a block device? What is a
socket? What is a virtual machine? What is a bus? What is a
commandline?
- ...the time period. How many of the abstractions named above were
around or viable in 1970, 1980, 1990, 2000? In the opposite
direction, when did you last use that lovely standardized protocol,
[[https://en.wikipedia.org/wiki/Common_Gateway_Interface][CGI]], to let your web application and your web server communicate,
use [[https://en.wikipedia.org/wiki/PHIGS][PHIGS]] to render graphics, or access a large multiuser system
via hard-wired terminals?
As such: Possible ways to modularize things vary. It may make no
sense that certain ways of modularization even can or should exist
until it's been done other ways hundreds or thousands of times.
Other terms are related too. "Loosely-coupled" (or loose coupling)
and "tightly-coupled" refer to the sort of abstractions sitting
between modules, or whether or not there even are separate modules.
"Decoupling" involves changing the relationship between modules
(sometimes, creating them in the first place), typically moving things
to a more sensible abstraction. "Factoring out" is really a form of
decoupling in which smaller parts of something are turned into a
module which the original thing then interfaces with (one canonical
example is taking some bits of code, often that are very similar or
identical in many places, and moving them into a single function). To
say one has "abstracted over" some details implies that a module is
handling those details, that the details shouldn't matter, and what
does matter is the abstraction one is using.
# -----
Consider the information this module deals in, in essence.
What is the most general form this information could be expressed in,
without being so general as to encompass other things that are
irrelevant or so low-level as to needlessly constrain the possible
contexts?
(Aristotle's theory of definitions?)
# -----
In a practical sense: Where someone "factors out" something that
occurs in similar or identical form in multiple places (incidentally,
@ -75,14 +130,6 @@ module (from what was factored out) and some number of abstractions
application itself is a module of a different sort. (Witness that
sometimes another application will implement the same plugin API.)
Given the strong ties between modularity and abstraction, the
possible and sensible ways to modularize things then also vary.
The modules that make sense may change over time. ("Have modern
tools tried to keep embodying the 'Unix philosophy'? Have they
extended it, even to other forms of abstraction that weren't
previously considered?")
One reason behind this is more practical in nature: When something is
a module unto itself, presumably it is relying on specific
abstractions, and it is possible to move this module to other contexts
@ -94,14 +141,68 @@ itself, the way it is designed and implemented often presents more
insight into the fundamentals of the problem it is solving. It
contains fewer incidental details, and more essential details.
* Other fluff
# -------
I was around to see what was normal for software made on Windows
3.1, Windows 95, and the like. My take is that most of these pieces
of software were sufficiently GUI-oriented that they tried to remove
most modularity from the user's perspective. Things like scripting
and automation were almost solely as afterthoughts, since most
interaction was designed explicitly around the GUI.
* Less-Conventional Examples
One thing I've watched with some interest is when new abstractions
emerge (or, perhaps, old ones become more widespread) to solve
problems that I wasn't even aware existed.
[[https://circleci.com/blog/it-really-is-the-future/][It really is the future]] talks about a lot of more recent forms of
modularity, most of which are beyond me and were completely unheard-of
in, say, 2010. [[https://www.functionalgeekery.com/episode-75-eric-b-merritt/][Functional Geekery episode 75]] talks about many similar
things
[[https://jupyter.org/][Jupyter Notebook]] is one of my favorites here. It provides a notebook
interface (similar to something like Maple or Mathematica) which:
- allows the notebook to use various different programming languages
underneath,
- decouples where the notebook is used and where it is running, due to
being implemented as a web application accessed through the browser,
- decouples the presentation of a stored notebook from Jupyter itself
by using a [[https://nbformat.readthedocs.io/en/latest/][JSON-based file format]] which can be rendered without
Jupyter (like GitHub does if you commit a .ipynb file).
I love notebook interfaces already because they simplify experimenting
by handling a lot of things I'd otherwise have to do manually - like
saving results and keeping them lined up with the exact code that
produced them. Jupyter adds some other use-cases find marvelous - for
instance, I can let the interpreter run on my much faster workstation,
but I can access it across the Internet from my much slower laptop.
[[https://zeppelin.apache.org/][Apache Zeppelin]] does similar things with different languages; I just
use it less.
Another favorite of mine is [[https://nixos.org/nix/][Nix]]. One excellent article, [[http://blog.ezyang.com/2014/08/the-fundamental-problem-of-programming-language-package-management/][The
fundamental problem of programming language package management]],
doesn't ever mention Nix but does a great job explaining the sorts of
problems it exists to solve. To be able to combine nearly all of the
programming-language specific package managers into a single module is
a very lofty goal, but Nix appears to do a decent job of it.
The [[https://www.lua.org/][Lua]] programming language is noteworthy here. It's written in
clean C with minimal dependencies, so it runs nearly anyplace with a a
C or C++ compiler. It's purposely very easy both to *embed* (i.e. to
put inside of a program and use as an extension language, such as for
plugins or scripting) and to *extend* (i.e. to connect with libraries
to allow their functionality to be used from Lua). [[https://www.gnu.org/software/guile/][GNU Guile]] has many
of the same properties.
[[https://web.hypothes.is/][hypothes.is]] is a curious one that I find fascinating. In effect,
they're trying to factor out annotation and commenting from something
that is handled on a per-webpage basis, and I really like what I've
seen.
The Unix tradition lives on in certain modern tools. [[https://stedolan.github.io/jq/][jq]] has proven
very useful anytime I've had to mess with JSON data. [[http://www.dest-unreach.org/socat/][socat]] and [[http://netcat.sourceforge.net/][netcat]]
have saved me numerous times. I'm sure certain people love the fact
that [[https://neovim.io/][Neovim]] is designed to be seamlessly embedded and to extend with
plugins. [[https://suckless.org/philosophy][suckless]] perhaps takes it too far, but gets an honorary
mention...
# ???
People know that I love Emacs, but I also do believe many of the
complaints on how large it is. On the one hand, it is basically its
@ -118,80 +219,39 @@ underneath, and this makes me wonder why it needs explicit support for
- Multiple CPUs
- Multiple hosts
- Nix, Guix
- [[Notes - Distributed stuff notes (from turtl)]]
- [[Notes - Paper, 2016-11-13]]
- See notes on functional geekery #75
- Jupyter
- Any Plan 9 papers? (Will have to dig deep in the archives)
- http://plan9.bell-labs.com/sys/doc/
- Tanenbaum vs. Linus war & microkernels
- Conjecture: A module is most useful when available in the most
general or most accessible context (e.g. Linux commandline tool
vs. a Wordpress plugin or an Emacs package) - the TBL quote on
least-power sort of corroborates this, but stands separate in some
other ways too.
- "most general" might not be right here.
- Other conjecture attempt: A module's power is related to how many
other modules it can communicate with, without requiring substantial
adaptation. An abstraction's power is related to the modularity it
accomodates.
- Another conjecture attempt: An abstraction's power isn't related to
how broad it is, but to how well it connects things. A needlessly
simplistic abstraction requires a lot of other adaptation to be
useful. A needlessly specific one excludes a lot of potential
modues.
- hypothes.is is a sort of module unto itself here too, trying to
remove commenting and annotation from existing, very siloed
solutions.
- TBL: "The choice of language is a common design choice. The low
power end of the scale is typically simpler to design, implement and
use, but the high power end of the scale has all the attraction of
being an open-ended hook into which anything can be placed: a door
to uses bounded only by the imagination of the programmer. Computer
Science in the 1960s to 80s spent a lot of effort making languages
which were as powerful as possible. Nowadays we have to appreciate
the reasons for picking not the most powerful solution but the least
powerful. The reason for this is that the less powerful the
language, the more you can do with the data stored in that
language. If you write it in a simple declarative from, anyone can
write a program to analyze it in many ways."
- "Self" paper & structural reification?
- I'm still not sure how this relates, but it may perhaps relate to
how *not* to make things modular (structural reification is a sort
of check on the scope of objects/classes)
- What by Rich Hickey?
- SICP?
- https://mitpress.mit.edu/sicp/full-text/sicp/book/node50.html
- Simple Made Easy?
- The Value of Values?
- SICP: [[https://mitpress.mit.edu/sicp/full-text/sicp/book/node50.html][Modularity, Objects, and State]]
- "On Understanding Data Abstraction, Revisited"
- Frameworks Don't Compose ([composition][])
- "On the Criteria to be Used in Decomposing System into Modules" (Barnas)
- suckless, and their tools & methodology
- https://suckless.org/philosophy
- even though they can take things waaaay too far...
- Containers?
- http://www.catb.org/~esr/writings/taoup/html/apb.html#Baldwin-Clark -
Carliss Baldwin and Kim Clark. Design Rules, Vol 1: The Power of
Modularity. 2000. MIT Press. ISBN 0-262-024667.
- https://colah.github.io/posts/2015-09-NN-Types-FP/ - Was this the
one that talked about 'modularity' in deep learning?
- NodeRED might be interesting here, but first I need a clear idea of
what it factored out into a separate component.
- Lua is notable here for the effort spent in making it easy to both
embed (e.g. as a scripting or extension language) and extend
(e.g. with other C libraries). Guile may be similar.
- NeoVim is also an interesting case here as it is designed to be
embedded, though I'm not sure what this means yet.
- Find the link to Les Hatton's slides (cyclomatic complexity?) on the
empirical effects of too many / too large modules
- Examples of more 'modern' tools:
- socat
- jq (the JSON processor)
- Nix and some related tools (which take related functionality that
is present in numerous PL-specific package managers)
- Jupyter
* Link-pile:
- [[http://www.catb.org/~esr/writings/taoup/html/][The Art of Unix Programming (Eric S. Raymond)]]
- [[https://circleci.com/blog/its-the-future/][It's the Future]]
- [[https://circleci.com/blog/it-really-is-the-future/][It really is the future]]
- [[https://www.youtube.com/watch?v%253Dtc4ROCJYbm0][AT&T Archives: The UNIX Operating System]]
- [[http://blog.ezyang.com/2014/08/the-fundamental-problem-of-programming-language-package-management/][The fundamental problem of programming language package management]]
- [[https://clojurefun.wordpress.com/2012/08/17/composition-over-convention/][Frameworks Don't Compose]]
- Brooks, No Silver Bullet?
- https://www.reddit.com/r/programming/comments/4bjss2/an_11_line_npm_package_called_leftpad_with_only/
- https://www.functionalgeekery.com/episode-75-eric-b-merritt/
- https://www.w3.org/DesignIssues/
- https://www.w3.org/DesignIssues/Modularity.html
- http://www.w3.org/DesignIssues/Principles.html
- http://www.freecode.com/articles/editorial-the-two-edged-sword
- https://clojurefun.wordpress.com/2012/08/17/composition-over-convention/
- https://en.wikipedia.org/wiki/Essential_complexity