From 765ec5624672c0fd65f4293aa27b9923260146dc Mon Sep 17 00:00:00 2001 From: Brian Picciano Date: Sun, 19 May 2019 13:07:02 -0600 Subject: [PATCH] continued work on program structure, pretty much done with Part 1 --- .../program-structure-and-composability.md | 267 +++++++++++++----- 1 file changed, 189 insertions(+), 78 deletions(-) diff --git a/_drafts/program-structure-and-composability.md b/_drafts/program-structure-and-composability.md index 3dba4fb..96baee8 100644 --- a/_drafts/program-structure-and-composability.md +++ b/_drafts/program-structure-and-composability.md @@ -6,24 +6,39 @@ description: >- complex structures, and a pattern which helps in solving those problems. --- -## Part 0: Intro +## Part 0: Introduction This post is focused on a concept I call "program structure", which I will try -to shed some light on before moving on to discussing complex program structures, +to shed some light on before discussing complex program structures, then discussing why complex structures can be problematic to deal with, and finally discussing a pattern for dealing with those problems. My background is as a backend engineer working on large projects that have had -many moving parts; most had multiple services interacting, used many different -databases in various contexts, and faced large amounts of load from millions of -users. Most of this post will be framed from my perspective, and present -problems in the way I have experienced them. I believe, however, that the -concepts and problems I discuss here are applicable to many other domains, and I -hope those with a foot in both backend systems and a second domain can help to -translate the ideas between the two. +many moving parts; most had multiple services interacting with each other, using +many different databases in various contexts, and facing large amounts of load +from millions of users. Most of this post will be framed from my perspective, +and will present problems in the way I have experienced them. I believe, +however, that the concepts and problems I discuss here are applicable to many +other domains, and I hope those with a foot in both backend systems and a second +domain can help to translate the ideas between the two. + +Also note that I will be using Go as my example language, but none of the +concepts discussed here are specific to Go. To that end, I've decided to favor +readable code over "correct" code, and so have elided things that most gophers +hold near-and-dear, such as error checking and comments on all public types, in +order to make the code as accessible as possible to non-gophers as well. As with +before, I trust someone with a foot in Go and another language can translate +help me translate between the two. ## Part 1: Program Structure +In this section I will discuss the difference between directory and program +structure, show how global state is antithetical to compartmentalization (and +therefore good program structure), and finally discuss a more effective way to +think about program structure. + +### Directory Structure + For a long time I thought about program structure in terms of the hierarchy present in the filesystem. In my mind, a program's structure looked like this: @@ -40,10 +55,10 @@ src/ main.go ``` -What I grew to learn was that this consolidation of "program structure" with +What I grew to learn was that this conflation of "program structure" with "directory structure" is ultimately unhelpful. While I won't deny that every program has a directory structure (and if not, it ought to), this does not mean -that the way the program looks in a filesystem in anyway corresponds to how it +that the way the program looks in a filesystem in any way corresponds to how it looks in our mind's eye. The most notable way to show this is to consider a library package. Here is the @@ -57,30 +72,39 @@ src/ main.go ``` -(Note that I use go as my example language throughout this post, but none of the -ideas I'll referring to are go specific.) - If I were to ask you, based on that directory strucure, what the program does, in the most abstract terms, you might say something like: "The program establishes an http server which listens for requests, as well as a connection to the redis server. The program then interacts with redis in different ways, based on the http requests which are received on the server." -And that would be a good guess. But consider another case: "The program -establishes an http server which listens for requests, as well as connections to -_two different_ redis servers. The program then interacts with one redis server -or the other in different ways, based on the http requests which are received -from the server. +And that would be a good guess. Here's a diagram which depicts the program +structure, wherein the root node, `main.go`, takes in requests from `http` and +processes them using `redis`. + +TODO diagram + +This is certainly a viable guess for how a program with that directory structure +operates, but consider another: "A component of the program called `server` +establishes an http server which listens for requests, as well as a connection +to a redis server. `server` then interacts with that redis connection in +different ways, based on the http requests which are received on the http +server. Additionally, `server` tracks statistics about these interactions and +makes them available to other components. The root component of the program +establishes a connection to a second redis server, and stores those statistics +in that redis server." + +TODO diagram The directory structure could apply to either description; `redis` is just a library which allows for interacting with a redis server, but it doesn't specify _which_ server, or _how many_. And those are extremely important factors which are definitely reflected in our concept of the program's structure, and yet not -in the directory structure. Even worse, thinking of structure in terms of -directories might (and, I claim, often does) cause someone to assume that -program only _could_ interact with one redis server, which is obviously untrue. +in the directory structure. **What the directory structure reflects are the +different _kinds_ of components available to use, but it does not reflect how a +program will use those components.** -### Global State and Microservices +### Global State vs. Compartmentalization The directory-centric approach to structure often leads to the use of global singletons to manage access to external resources like RPC servers and @@ -88,70 +112,157 @@ databases. In the above example the `redis` library might contain code which looks something like: ```go -// For the non-gophers, redisConnection is variable type which has been made up -// for this example. -var globalConn redisConnection +// A mapping of connection names to redis connections. +var globalConns = map[string]redisConnection -func Get() redisConnection { - if globalConn == nil { - globalConn = makeConnection() +func Get(name string) redisConnection { + if globalConns[name] == nil { + globalConns[name] = makeConnection(name) } - return globalConn + return globalConns[name] } ``` -Ignoring that the above code is not thread-safe, the above pattern has some -serious drawbacks. For starters, it does not play nicely with a microservices -oriented system, or any other system with good separation of concerns between -its components. +Even though this pattern would work, it breaks with our conception of the +program structure in the more complex case shown above. Rather than having the +`server` component own the redis server it uses, the root component would be the +owner of it, and `server` would be borrowing it. Compartmentalization has been +broken, and can only be held together through sheer human discipline. -I have been a part of building several large products with teams of various -sizes. In each case we had a common library which was shared amongst all -components of the system, and contained functionality which was desired to be -kept the same across those components. For example, configuration was generally -done through that library, so all components could be configured in the same -way. Similarly, an RPC framework is usually included in the common library, so -all components can communicate in a shared language. The common library also -generally contains domain specific types, for example a `User` type which all -components will need to be able to understand. +This is the problem with all global state. It's shareable amongst all components +of a program, and so is owned by none of them. One must look at an entire +codebase to understand how a globally held component is used, which might not +even be possible for a large codebase. And so the maintainers of these shared +components rely entirely on the discipline of their fellow coders when making +changes, usually discovering where that discipline broke down once the changes +have been pushed live. -Most common libraries also have parts dedicated to databases, such as the -`redis` library example we've been using. In a medium-to-large sized system, -with many components, there are likely to be multiple running instances of any -database: multiple SQLs, different caches for each, different queues set up for -different asynchronous tasks, etc... And this is good! The ideal -compartmentalized system has components interact with each other directly, not -via their databases, and so each component ought to, to the extent possible, -keep its own databases to itself, with other components not touching them. +Global state also makes it easier for disparate services/components to share +datastores for completely unrelated tasks. In the above example, rather than +creating a new redis instance for the root component's statistics storage, the +coder might have instead said "well, there's already a redis instance available, +I'll just use that." And so compartmentalization would have been broken further. +Perhaps the two instances _could_ be coalesced into the same one, for the sake +of resource efficiency, but that decision would be better made at runtime via +the configuration of the program, rather than being hardcoded into the code. -The singleton pattern breaks this separation, by forcing the configuration of -_all_ databases through the common library. If one component in the system adds -a database instance, all other components have access to it. While this doesn't -necessarily mean the components will _use_ it, that will only be accomplished -through sheer discipline, which will inevitably break down once management -decides it's crunch time. +From the perspective of team management, global state-based patterns do nothing +except slow teams down. The person/team responsible for maintaining the central +library which holds all the shared resources (`redis`, in the above example) +becomes the bottleneck for creating new instances for new components, which will +further lead to re-using existing instances rather than create new ones, further +breaking compartmentalization. The person/team responsible for the central +library often finds themselves as the maintainers of the shared resource as +well, rather than the team actually using it. -To be clear, I'm not suggesting that singletons make proper compartmentalization -impossible, they simply add friction to it. In other words, compartmentalization -is not the default mode of singletons. +### Program Structure -Another problem with singletons, as mentioned before, is that they don't handle -multiple instances of the same thing very well. In order to support having -multiple redis instances in the system, the above code would need to be modified -to give every instance a name, and track the mapping of between that name, its -singleton, and its configuration. For large projects the number of different -instances can be enormous, and often the list which exists in code does not stay -fully up-to-date. +So what does proper program structure look like? In my mind the structure of a +program is a hierarchy of components, or, in other words, a tree. The leaf nodes +of the tree are almost _always_ IO related components, e.g. database +connections, RPC server frameworks or clients, message queue consumers, etc... +The non-leaf nodes will _generally_ be components which bring together the +functionalities of their children in some useful way, though they may also have +some IO functionality of their own. + +Let's look at an even more complex structure, still only using the `redis` and +`http` component types: + +TODO diagram: +``` + root + rest-api + redis + http + redis // for stats keeping + debug + http +``` + +This structure contains the addition of the `debug` component. Clearly the +`http` and `redis` components are reusable in different contexts, but for this +example the `debug` endpoint is as well. It creates a separate http server which +can be queried to perform runtime debugging of the program, and can be tacked +onto virtually any program. The `rest-api` component is specific to this program +and therefore not reusable. Let's dive into it a bit to see how it might be +implemented: + +```go +// RestAPI is very much not thread-safe, hopefully it doesn't have to handle +// more than one request at once. +type RestAPI struct { + redisConn *redis.Conn + httpSrv *http.Server + + // Statistics exported for other components to see + RequestCount int + FooRequestCount int + BarRequestCount int +} + +func NewRestAPI() *RestAPI { + r := new(RestAPI) + r.redisConn := redis.NewConn("127.0.0.1:6379") + + // mux will route requests to different handlers based on their URL path. + mux := http.NewServeMux() + mux.Handle("/foo", http.HandlerFunc(r.fooHandler)) + mux.Handle("/bar", http.HandlerFunc(r.barHandler)) + r.httpSrv := http.NewServer(mux) + + // Listen for requests and serve them in the background. + go r.httpSrv.Listen(":8000") + + return r +} + +func (r *RestAPI) fooHandler(rw http.ResponseWriter, r *http.Request) { + r.redisConn.Command("INCR", "fooKey") + r.RequestCount++ + r.FooRequestCount++ +} + +func (r *RestAPI) barHandler(rw http.ResponseWriter, r *http.Request) { + r.redisConn.Command("INCR", "barKey") + r.RequestCount++ + r.BarRequestCount++ +} +``` + +As can be seen, `rest-api` coalesces `http` and `redis` into a simple REST api, +using pre-made library components. `main.go`, the root component, does much the +same: + +```go +func main() { + // Create debug server and start listening in the background + debugSrv := debug.NewServer() + + // Set up the RestAPI, this will automatically start listening + restAPI := NewRestAPI() + + // Create another redis connection and use it to store statistics + statsRedisConn := redis.NewConn("127.0.0.1:6380") + for { + time.Sleep(1 * time.Second) + statsRedisConn.Command("SET", "numReqs", restAPI.RequestCount) + statsRedisConn.Command("SET", "numFooReqs", restAPI.FooRequestCount) + statsRedisConn.Command("SET", "numBarReqs", restAPI.BarRequestCount) + } +} +``` + +One thing which is clearly missing in this program is proper configuration, +whether from command-line, environment variables, etc.... As it stands, all +configuration parameters, such as the redis addresses and http listen addresses, +are hardcoded. Proper configuration actually ends up being somewhat difficult, +as the ideal case would be for each component to set up the configuration +variables of itself, without its parent needing to be aware. For example, +`redis` could set up `addr` and `pool-size` parameters. The problem is that +there are two `redis` components in the program, and their parameters would +therefore conflict with each other. An elegant solution to this problem is +discussed in the next section. + +## Part 2: Context, Configuration, and Runtime -This might all sound petty, but I think it has a large impact. Ultimately, when -a component is using a singleton which is housed in a common library, that -component is borrowing the instance, rather than owning it. Put another way, the -component's structure is partially held by the common library, and since all -components are going to use the common library, all of their structures are -incorporated together. The separation between components is less solidified, and -systems become weaker. -What I'm going to propose is an alternative way to think about program structure -which still allows for all the useful aspects of a common library, without -compromising on component separation, and therefore giving large teams more -freedom to act independently of each other.