began work on program structure post

2019-05-18 14:29:48 -06:00 · 2019-05-18 14:29:48 -06:00 · 961b045398
commit 961b045398
parent 2b47573674
1 changed files with 157 additions and 0 deletions
--- a/_drafts/program-structure-and-composability.md
+++ b/_drafts/program-structure-and-composability.md
@ -0,0 +1,157 @@
+---
+title: >-
+    Program Structure and Composability
+description: >-
+    Discussing the nature of program structure, the problems presented by
+    complex structures, and a pattern which helps in solving those problems.
+---
+
+## Part 0: Intro
+
+This post is focused on a concept I call "program structure", which I will try
+to shed some light on before moving on to discussing complex program structures,
+discussing why complex structures can be problematic to deal with, and finally
+discussing a pattern for dealing with those problems.
+
+My background is as a backend engineer working on large projects that have had
+many moving parts; most had multiple services interacting, used many different
+databases in various contexts, and faced large amounts of load from millions of
+users. Most of this post will be framed from my perspective, and present
+problems in the way I have experienced them. I believe, however, that the
+concepts and problems I discuss here are applicable to many other domains, and I
+hope those with a foot in both backend systems and a second domain can help to
+translate the ideas between the two.
+
+## Part 1: Program Structure
+
+For a long time I thought about program structure in terms of the hierarchy
+present in the filesystem. In my mind, a program's structure looked like this:
+
+```
+// The directory structure of a project called gobdns.
+src/
+    config/
+    dns/
+    http/
+    ips/
+    persist/
+    repl/
+    snapshot/
+    main.go
+```
+
+What I grew to learn was that this consolidation of "program structure" with
+"directory structure" is ultimately unhelpful. While I won't deny that every
+program has a directory structure (and if not, it ought to), this does not mean
+that the way the program looks in a filesystem in anyway corresponds to how it
+looks in our mind's eye.
+
+The most notable way to show this is to consider a library package. Here is the
+structure of a simple web-app which uses redis (my favorite database) as a
+backend:
+
+```
+src/
+    redis/
+    http/
+    main.go
+```
+
+(Note that I use go as my example language throughout this post, but none of the
+ideas I'll referring to are go specific.)
+
+If I were to ask you, based on that directory strucure, what the program does,
+in the most abstract terms, you might say something like: "The program
+establishes an http server which listens for requests, as well as a connection
+to the redis server. The program then interacts with redis in different ways,
+based on the http requests which are received on the server."
+
+And that would be a good guess. But consider another case: "The program
+establishes an http server which listens for requests, as well as connections to
+_two different_ redis servers. The program then interacts with one redis server
+or the other in different ways, based on the http requests which are received
+from the server.
+
+The directory structure could apply to either description; `redis` is just a
+library which allows for interacting with a redis server, but it doesn't specify
+_which_ server, or _how many_. And those are extremely important factors which
+are definitely reflected in our concept of the program's structure, and yet not
+in the directory structure. Even worse, thinking of structure in terms of
+directories might (and, I claim, often does) cause someone to assume that
+program only _could_ interact with one redis server, which is obviously untrue.
+
+### Global State and Microservices
+
+The directory-centric approach to structure often leads to the use of global
+singletons to manage access to external resources like RPC servers and
+databases. In the above example the `redis` library might contain code which
+looks something like:
+
+```go
+// For the non-gophers, redisConnection is variable type which has been made up
+// for this example.
+var globalConn redisConnection
+
+func Get() redisConnection {
+    if globalConn == nil {
+        globalConn = makeConnection()
+    }
+    return globalConn
+}
+```
+
+Ignoring that the above code is not thread-safe, the above pattern has some
+serious drawbacks. For starters, it does not play nicely with a microservices
+oriented system, or any other system with good separation of concerns between
+its components.
+
+I have been a part of building several large products with teams of various
+sizes. In each case we had a common library which was shared amongst all
+components of the system, and contained functionality which was desired to be
+kept the same across those components. For example, configuration was generally
+done through that library, so all components could be configured in the same
+way. Similarly, an RPC framework is usually included in the common library, so
+all components can communicate in a shared language. The common library also
+generally contains domain specific types, for example a `User` type which all
+components will need to be able to understand.
+
+Most common libraries also have parts dedicated to databases, such as the
+`redis` library example we've been using. In a medium-to-large sized system,
+with many components, there are likely to be multiple running instances of any
+database: multiple SQLs, different caches for each, different queues set up for
+different asynchronous tasks, etc... And this is good! The ideal
+compartmentalized system has components interact with each other directly, not
+via their databases, and so each component ought to, to the extent possible,
+keep its own databases to itself, with other components not touching them.
+
+The singleton pattern breaks this separation, by forcing the configuration of
+_all_ databases through the common library. If one component in the system adds
+a database instance, all other components have access to it. While this doesn't
+necessarily mean the components will _use_ it, that will only be accomplished
+through sheer discipline, which will inevitably break down once management
+decides it's crunch time.
+
+To be clear, I'm not suggesting that singletons make proper compartmentalization
+impossible, they simply add friction to it. In other words, compartmentalization
+is not the default mode of singletons.
+
+Another problem with singletons, as mentioned before, is that they don't handle
+multiple instances of the same thing very well. In order to support having
+multiple redis instances in the system, the above code would need to be modified
+to give every instance a name, and track the mapping of between that name, its
+singleton, and its configuration. For large projects the number of different
+instances can be enormous, and often the list which exists in code does not stay
+fully up-to-date.
+
+This might all sound petty, but I think it has a large impact. Ultimately, when
+a component is using a singleton which is housed in a common library, that
+component is borrowing the instance, rather than owning it. Put another way, the
+component's structure is partially held by the common library, and since all
+components are going to use the common library, all of their structures are
+incorporated together. The separation between components is less solidified, and
+systems become weaker.
+
+What I'm going to propose is an alternative way to think about program structure
+which still allows for all the useful aspects of a common library, without
+compromising on component separation, and therefore giving large teams more
+freedom to act independently of each other.