#+TITLE: Modularity & Abstraction (working title) #+AUTHOR: Chris Hodapp #+DATE: April 20, 2017 #+TAGS: technobabble _Modularity_ and _abstraction_ feature prominently wherever computers are involved. This is meant very broadly: it applies to designing software, using software, integrating software, and to a lot of hardware as well. It applies elsewhere, and almost certainly originated there first, however, it appears to be particularly crucial around software. Definitions, though, are a bit vague (including anything in this post). My goal in this post isn't to try to (re)define them, but to explain a bit of their essence, and expand on a few theses: - Modularity arises naturally in a wide array of places. - Modularity and abstraction are intrinsically connected. - Whether a given modularization makes sense depends strongly on meaning and relevance of *information* inside and outside of modules, and broad context matters to those. * Why? People generally agree that "modularity" is good. The idea that something complex can be designed and understood in terms of smaller, simpler pieces comes naturally to anyone that has built something out of smaller pieces or taken something apart. It runs very deep in the Unix philosophy, which ESR gives a good overview of in [[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html][The Art of Unix Programming]] (or, listen to it from [[https://youtu.be/tc4ROCJYbm0?t%3D248][Kernighan himself]] at Bell Labs in 1982.) Tim Berners-Lee gives some practical limitations in [[https://www.w3.org/DesignIssues/Principles.html][Principles of Design]] and in [[https://www.w3.org/DesignIssues/Modularity.html][Modularity]]: "Modular design hinges on the simplicity and abstract nature of the interface definition between the modules. A design in which the insides of each module need to know all about each other is not a modular design but an arbitrary partitioning of the bits... It is not only necessary to make sure your own system is designed to be made of modular parts. It is also necessary to realize that your own system, no matter how big and wonderful it seems now, should always be designed to be a part of another larger system." Les Hatton in [[http://www.leshatton.org/TAIC2008-29-08-2008.html][The role of empiricism in improving the reliability of future software]] even did an interesting derivation tying the defect density in software to how it is broken into pieces. "Abstraction" doesn't have quite the same consensus. In software, it's generally understood that decoupled or loosely-coupled is better than tightly-coupled, but at the same time, "abstraction" can have the connotation of something that gets in the way, adds overhead, and confuses things. Dijkstra, in one of few instances of not being snarky, allegedly said, "Being abstract is something profoundly different from being vague. The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise." Joel Spolsky, in one of few instances of me actually caring what he said, also has a blog post from 2002 on the [[https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/][Law of Leaky Abstractions]]. The [[https://en.wikipedia.org/wiki/Principle_of_least_privilege][principle of least privilege]] is likewise a thing. So, abstraction too has its practical and theoretical limitations. * How They Relate I bring these up together because: *abstractions* are the boundaries between *modules*, and the communication channels (APIs, languages, interfaces, protocols) through which they talk. It need not necessarily be a standardized interface or a well-documented boundary, though that helps. Available abstractions vary. They vary by, for instance: - ...what language you choose. Consider, for instance, that a language like Haskell contains various abstractions done largely within the type system that cannot be expressed in many other languages. Languages like Python, Ruby, or JavaScript might have various abstractions meaningful only in the context of dynamic typing. Some languages more readily permit the creation of new abstractions, and this might lead to a broader range of abstractions implemented in libraries. - ...the operating system and its standard library. What is a process? What is a thread? What is a dynamic library? What is a filesystem? What is a file? What is a block device? What is a socket? What is a virtual machine? What is a bus? What is a commandline? - ...the time period. How many of the abstractions named above were around or viable in 1970, 1980, 1990, 2000? In the opposite direction, when did you last use that lovely standardized protocol, [[https://en.wikipedia.org/wiki/Common_Gateway_Interface][CGI]], to let your web application and your web server communicate, use [[https://en.wikipedia.org/wiki/PHIGS][PHIGS]] to render graphics, or access a large multiuser system via hard-wired terminals? As such: Possible ways to modularize things vary. It may make no sense that certain ways of modularization even can or should exist until it's been done other ways hundreds or thousands of times. Other terms are related too. "Loosely-coupled" (or loose coupling) and "tightly-coupled" refer to the sort of abstractions sitting between modules, or whether or not there even are separate modules. "Decoupling" involves changing the relationship between modules (sometimes, creating them in the first place), typically moving things to a more sensible abstraction. "Factoring out" is really a form of decoupling in which smaller parts of something are turned into a module which the original thing then interfaces with (one canonical example is taking some bits of code, often that are very similar or identical in many places, and moving them into a single function). To say one has "abstracted over" some details implies that a module is handling those details, that the details shouldn't matter, and what does matter is the abstraction one is using. # ----- Consider the information this module deals in, in essence. What is the most general form this information could be expressed in, without being so general as to encompass other things that are irrelevant or so low-level as to needlessly constrain the possible contexts? (Aristotle's theory of definitions?) # ----- In a practical sense: Where someone "factors out" something that occurs in similar or identical form in multiple places (incidentally, "decouples" also works fine as a term), they're often creating a module (from what was factored out) and some number of abstractions (from the break that created). Consider some examples: - Some configurable functionality in a larger application is extracted out into a system of plugins. The details of the application are abstracted over (as far as the plugin cares), and the details of the plugin are abstracted over (as far as the application cares). The API that the application and plugins use to communicate is the new abstraction now available. The plugins are modules, and the application itself is a module of a different sort. (Witness that sometimes another application will implement the same plugin API.) One reason behind this is more practical in nature: When something is a module unto itself, presumably it is relying on specific abstractions, and it is possible to move this module to other contexts (anything providing the same abstractions) or to replace it with other modules (anything using the same abstractions). Another reason is more abstract: When something it a module unto itself, the way it is designed and implemented often presents more insight into the fundamentals of the problem it is solving. It contains fewer incidental details, and more essential details. # ------- * Less-Conventional Examples One thing I've watched with some interest is when new abstractions emerge (or, perhaps, old ones become more widespread) to solve problems that I wasn't even aware existed. [[https://circleci.com/blog/it-really-is-the-future/][It really is the future]] talks about a lot of more recent forms of modularity, most of which are beyond me and were completely unheard-of in, say, 2010. [[https://www.functionalgeekery.com/episode-75-eric-b-merritt/][Functional Geekery episode 75]] talks about many similar things [[https://jupyter.org/][Jupyter Notebook]] is one of my favorites here. It provides a notebook interface (similar to something like Maple or Mathematica) which: - allows the notebook to use various different programming languages underneath, - decouples where the notebook is used and where it is running, due to being implemented as a web application accessed through the browser, - decouples the presentation of a stored notebook from Jupyter itself by using a [[https://nbformat.readthedocs.io/en/latest/][JSON-based file format]] which can be rendered without Jupyter (like GitHub does if you commit a .ipynb file). I love notebook interfaces already because they simplify experimenting by handling a lot of things I'd otherwise have to do manually - like saving results and keeping them lined up with the exact code that produced them. Jupyter adds some other use-cases find marvelous - for instance, I can let the interpreter run on my much faster workstation, but I can access it across the Internet from my much slower laptop. [[https://zeppelin.apache.org/][Apache Zeppelin]] does similar things with different languages; I just use it less. Another favorite of mine is [[https://nixos.org/nix/][Nix]]. One excellent article, [[http://blog.ezyang.com/2014/08/the-fundamental-problem-of-programming-language-package-management/][The fundamental problem of programming language package management]], doesn't ever mention Nix but does a great job explaining the sorts of problems it exists to solve. To be able to combine nearly all of the programming-language specific package managers into a single module is a very lofty goal, but Nix appears to do a decent job of it. The [[https://www.lua.org/][Lua]] programming language is noteworthy here. It's written in clean C with minimal dependencies, so it runs nearly anyplace with a a C or C++ compiler. It's purposely very easy both to *embed* (i.e. to put inside of a program and use as an extension language, such as for plugins or scripting) and to *extend* (i.e. to connect with libraries to allow their functionality to be used from Lua). [[https://www.gnu.org/software/guile/][GNU Guile]] has many of the same properties. [[https://web.hypothes.is/][hypothes.is]] is a curious one that I find fascinating. In effect, they're trying to factor out annotation and commenting from something that is handled on a per-webpage basis, and I really like what I've seen. The Unix tradition lives on in certain modern tools. [[https://stedolan.github.io/jq/][jq]] has proven very useful anytime I've had to mess with JSON data. [[http://www.dest-unreach.org/socat/][socat]] and [[http://netcat.sourceforge.net/][netcat]] have saved me numerous times. I'm sure certain people love the fact that [[https://neovim.io/][Neovim]] is designed to be seamlessly embedded and to extend with plugins. [[https://suckless.org/philosophy][suckless]] perhaps takes it too far, but gets an honorary mention... # ??? People know that I love Emacs, but I also do believe many of the complaints on how large it is. On the one hand, it is basically its own operating system and within this it has considerable modularity. On the other hand, there is a perfectly usable operating system underneath, and this makes me wonder why it needs explicit support for [[https://www.gnu.org/software/tramp/][network transparency]]. * Fragments - Abstracting over... - Multiple applications - Multiple users - Multiple CPUs - Multiple hosts - [[Notes - Paper, 2016-11-13]] - Any Plan 9 papers? (Will have to dig deep in the archives) - http://plan9.bell-labs.com/sys/doc/ - Tanenbaum vs. Linus war & microkernels - TBL: "The choice of language is a common design choice. The low power end of the scale is typically simpler to design, implement and use, but the high power end of the scale has all the attraction of being an open-ended hook into which anything can be placed: a door to uses bounded only by the imagination of the programmer. Computer Science in the 1960s to 80s spent a lot of effort making languages which were as powerful as possible. Nowadays we have to appreciate the reasons for picking not the most powerful solution but the least powerful. The reason for this is that the less powerful the language, the more you can do with the data stored in that language. If you write it in a simple declarative from, anyone can write a program to analyze it in many ways." - "Self" paper & structural reification? - I'm still not sure how this relates, but it may perhaps relate to how *not* to make things modular (structural reification is a sort of check on the scope of objects/classes) - What by Rich Hickey? - Simple Made Easy? - The Value of Values? - SICP: [[https://mitpress.mit.edu/sicp/full-text/sicp/book/node50.html][Modularity, Objects, and State]] - "On Understanding Data Abstraction, Revisited" - "On the Criteria to be Used in Decomposing System into Modules" (Barnas) - http://www.catb.org/~esr/writings/taoup/html/apb.html#Baldwin-Clark - Carliss Baldwin and Kim Clark. Design Rules, Vol 1: The Power of Modularity. 2000. MIT Press. ISBN 0-262-024667. - https://colah.github.io/posts/2015-09-NN-Types-FP/ - Was this the one that talked about 'modularity' in deep learning? - [[https://clojurefun.wordpress.com/2012/08/17/composition-over-convention/][Frameworks Don't Compose]] - Brooks, No Silver Bullet? - https://www.reddit.com/r/programming/comments/4bjss2/an_11_line_npm_package_called_leftpad_with_only/ - http://www.freecode.com/articles/editorial-the-two-edged-sword - https://en.wikipedia.org/wiki/Essential_complexity - GObject framework: an object system that sits outside of any particular language (though this is nothing particularly new) - libgreen