parent
2b47573674
commit
961b045398
@ -0,0 +1,157 @@ |
||||
--- |
||||
title: >- |
||||
Program Structure and Composability |
||||
description: >- |
||||
Discussing the nature of program structure, the problems presented by |
||||
complex structures, and a pattern which helps in solving those problems. |
||||
--- |
||||
|
||||
## Part 0: Intro |
||||
|
||||
This post is focused on a concept I call "program structure", which I will try |
||||
to shed some light on before moving on to discussing complex program structures, |
||||
discussing why complex structures can be problematic to deal with, and finally |
||||
discussing a pattern for dealing with those problems. |
||||
|
||||
My background is as a backend engineer working on large projects that have had |
||||
many moving parts; most had multiple services interacting, used many different |
||||
databases in various contexts, and faced large amounts of load from millions of |
||||
users. Most of this post will be framed from my perspective, and present |
||||
problems in the way I have experienced them. I believe, however, that the |
||||
concepts and problems I discuss here are applicable to many other domains, and I |
||||
hope those with a foot in both backend systems and a second domain can help to |
||||
translate the ideas between the two. |
||||
|
||||
## Part 1: Program Structure |
||||
|
||||
For a long time I thought about program structure in terms of the hierarchy |
||||
present in the filesystem. In my mind, a program's structure looked like this: |
||||
|
||||
``` |
||||
// The directory structure of a project called gobdns. |
||||
src/ |
||||
config/ |
||||
dns/ |
||||
http/ |
||||
ips/ |
||||
persist/ |
||||
repl/ |
||||
snapshot/ |
||||
main.go |
||||
``` |
||||
|
||||
What I grew to learn was that this consolidation of "program structure" with |
||||
"directory structure" is ultimately unhelpful. While I won't deny that every |
||||
program has a directory structure (and if not, it ought to), this does not mean |
||||
that the way the program looks in a filesystem in anyway corresponds to how it |
||||
looks in our mind's eye. |
||||
|
||||
The most notable way to show this is to consider a library package. Here is the |
||||
structure of a simple web-app which uses redis (my favorite database) as a |
||||
backend: |
||||
|
||||
``` |
||||
src/ |
||||
redis/ |
||||
http/ |
||||
main.go |
||||
``` |
||||
|
||||
(Note that I use go as my example language throughout this post, but none of the |
||||
ideas I'll referring to are go specific.) |
||||
|
||||
If I were to ask you, based on that directory strucure, what the program does, |
||||
in the most abstract terms, you might say something like: "The program |
||||
establishes an http server which listens for requests, as well as a connection |
||||
to the redis server. The program then interacts with redis in different ways, |
||||
based on the http requests which are received on the server." |
||||
|
||||
And that would be a good guess. But consider another case: "The program |
||||
establishes an http server which listens for requests, as well as connections to |
||||
_two different_ redis servers. The program then interacts with one redis server |
||||
or the other in different ways, based on the http requests which are received |
||||
from the server. |
||||
|
||||
The directory structure could apply to either description; `redis` is just a |
||||
library which allows for interacting with a redis server, but it doesn't specify |
||||
_which_ server, or _how many_. And those are extremely important factors which |
||||
are definitely reflected in our concept of the program's structure, and yet not |
||||
in the directory structure. Even worse, thinking of structure in terms of |
||||
directories might (and, I claim, often does) cause someone to assume that |
||||
program only _could_ interact with one redis server, which is obviously untrue. |
||||
|
||||
### Global State and Microservices |
||||
|
||||
The directory-centric approach to structure often leads to the use of global |
||||
singletons to manage access to external resources like RPC servers and |
||||
databases. In the above example the `redis` library might contain code which |
||||
looks something like: |
||||
|
||||
```go |
||||
// For the non-gophers, redisConnection is variable type which has been made up |
||||
// for this example. |
||||
var globalConn redisConnection |
||||
|
||||
func Get() redisConnection { |
||||
if globalConn == nil { |
||||
globalConn = makeConnection() |
||||
} |
||||
return globalConn |
||||
} |
||||
``` |
||||
|
||||
Ignoring that the above code is not thread-safe, the above pattern has some |
||||
serious drawbacks. For starters, it does not play nicely with a microservices |
||||
oriented system, or any other system with good separation of concerns between |
||||
its components. |
||||
|
||||
I have been a part of building several large products with teams of various |
||||
sizes. In each case we had a common library which was shared amongst all |
||||
components of the system, and contained functionality which was desired to be |
||||
kept the same across those components. For example, configuration was generally |
||||
done through that library, so all components could be configured in the same |
||||
way. Similarly, an RPC framework is usually included in the common library, so |
||||
all components can communicate in a shared language. The common library also |
||||
generally contains domain specific types, for example a `User` type which all |
||||
components will need to be able to understand. |
||||
|
||||
Most common libraries also have parts dedicated to databases, such as the |
||||
`redis` library example we've been using. In a medium-to-large sized system, |
||||
with many components, there are likely to be multiple running instances of any |
||||
database: multiple SQLs, different caches for each, different queues set up for |
||||
different asynchronous tasks, etc... And this is good! The ideal |
||||
compartmentalized system has components interact with each other directly, not |
||||
via their databases, and so each component ought to, to the extent possible, |
||||
keep its own databases to itself, with other components not touching them. |
||||
|
||||
The singleton pattern breaks this separation, by forcing the configuration of |
||||
_all_ databases through the common library. If one component in the system adds |
||||
a database instance, all other components have access to it. While this doesn't |
||||
necessarily mean the components will _use_ it, that will only be accomplished |
||||
through sheer discipline, which will inevitably break down once management |
||||
decides it's crunch time. |
||||
|
||||
To be clear, I'm not suggesting that singletons make proper compartmentalization |
||||
impossible, they simply add friction to it. In other words, compartmentalization |
||||
is not the default mode of singletons. |
||||
|
||||
Another problem with singletons, as mentioned before, is that they don't handle |
||||
multiple instances of the same thing very well. In order to support having |
||||
multiple redis instances in the system, the above code would need to be modified |
||||
to give every instance a name, and track the mapping of between that name, its |
||||
singleton, and its configuration. For large projects the number of different |
||||
instances can be enormous, and often the list which exists in code does not stay |
||||
fully up-to-date. |
||||
|
||||
This might all sound petty, but I think it has a large impact. Ultimately, when |
||||
a component is using a singleton which is housed in a common library, that |
||||
component is borrowing the instance, rather than owning it. Put another way, the |
||||
component's structure is partially held by the common library, and since all |
||||
components are going to use the common library, all of their structures are |
||||
incorporated together. The separation between components is less solidified, and |
||||
systems become weaker. |
||||
|
||||
What I'm going to propose is an alternative way to think about program structure |
||||
which still allows for all the useful aspects of a common library, without |
||||
compromising on component separation, and therefore giving large teams more |
||||
freedom to act independently of each other. |
Loading…
Reference in new issue