component oriented programming post

4 years ago · dbf6ebdeee
parent 8d1de40ef7
commit dbf6ebdeee
1 changed files with 568 additions and 0 deletions
--- a/_posts/2020-11-16-component-oriented-programming.md
+++ b/_posts/2020-11-16-component-oriented-programming.md
@ -0,0 +1,568 @@
+---
+title: >-
+    Component Oriented Programming
+description: >-
+    A concise description of.
+---
+
+[A previous post in this
+blog](2019-08-02-program-structure-and-composability.html) focused on a
+framework developed to make designing component-based programs easier. In
+retrospect pattern/framework proposed was over-engineered; this post attempts to
+present the same ideas but in a more distilled form, as a simple programming
+pattern and without the unnecessary framework.
+
+Nothing in this post will be revelatory; it's surely all been said before. But
+hopefully the form it takes here will be useful to someone, as it would have
+been useful to myself when I first learned to program.
+
+## Axioms
+
+For the sake of brevity let's assume the following: within the context of
+single-process (_not_ the same as single-threaded), non-graphical programs the
+following may be said:
+
+1. A program may be thought of as a black-box with certain input and output
+   methods. It is the programmer's task to construct a program such that
+   specific inputs yield specific desired outputs.
+
+2. A program is not complete without sufficient testing to prove it's complete.
+
+3. Global state and global impure functions makes testing more difficult. This
+   can include singletons and system calls.
+
+Any of these may be argued, but that will be left for other posts. Any of these
+may be said of other types of programs as well, but that can also be left for
+other posts.
+
+## Components
+
+Properties of components include:
+
+1. *Creatable*: An instance of a component, given some defined set of
+   parameters, can be created independently of any other instance of that or any
+   other component.
+
+2. *Composable*: A component may be used as a parameter of another component's
+   instantiation. This would make it a child component of the one being
+   instantiated (i.e. the parent).
+
+3. *Abstract*: A component is an interface consisting of one or more methods.
+   Being an interface, a component may have one or more implementations, but
+   generally will have a primary implementation, which is used during a
+   program's runtime, and secondary "mock" implementations, which are only used
+   when testing other components.
+
+4. *Isolated*: A component may not use mutable global variables (i.e.
+   singletons) or impure global functions (e.g. system calls). It may only use
+   constants and variables/components given to it during instantiation.
+
+5. *Ephemeral*: A component may have a specific method used to clean up all
+   resources that it's holding (e.g. network connections, file handles,
+   language-specific lightweight threads, etc).
+
+   5a. This cleanup method should _not_ clean up any child components given as
+   instantiation parameters.
+
+   5b. This cleanup method should not return until the component's cleanup is
+   complete.
+
+Components are composed together to create programs. This is done by passing
+components as parameters to other components during instantiation. The `main`
+process of the program is responsible for instantiating and composing most, if
+not all, components in the program.
+
+A component oriented program is one which primarily, if not entirely, uses
+components for its functionality. Components generally have the quality of being
+able to interact with code written in other patterns without any toes being
+stepped on.
+
+## Example
+
+Let's start with an example: suppose a program is desired which accepts a string
+over stdin, hashes it, then writes the string to a file whose name is the hash.
+
+A naive implementation of this program in go might look like:
+
+```go
+package main
+
+import (
+	"crypto/sha1"
+	"encoding/hex"
+	"io"
+	"io/ioutil"
+	"os"
+)
+
+func hashFileWriter() error {
+	h := sha1.New()
+	r := io.TeeReader(os.Stdin, h)
+	body, _ := ioutil.ReadAll(r)
+	fileName := hex.EncodeToString(h.Sum(nil))
+
+	if err := ioutil.WriteFile(fileName, body, 0644); err != nil {
+		return err
+	}
+
+	return nil
+}
+
+func main() {
+	if err := hashFileWriter(); err != nil {
+		panic(err) // consider the error handled
+	}
+}
+```
+
+Notice that there's not a clear separation here between different components;
+`hashFileWriter` _might_ be considered a one method component, except that it
+breaks component property 4, which says that a component may not use mutable
+global variables (`os.Stdin`) or impure global functions (`ioutil.WriteFile`).
+
+Notice also that testing the program would require integration tests, and could
+not be unit tested (because there are no units, i.e. components). For a trivial
+program like this one writing unit and integration tests would be redundant, but
+for larger programs it may not be. Unit tests are important because they are
+fast to run, (usually) easy to formulate, and yield consistent results.
+
+This program could instead be written as being composed of three components:
+
+* `stdin`, a construct given by the runtime which outputs a stream of bytes.
+
+* `disk`, accepts a file name and file contents as input, writes the file
+  contents to a file of the given name, and potentially returns an error back.
+
+* `hashFileWriter`, reads a stream of bytes off a `stdin`, collects the stream
+  into a string, hashes that string to generate a file name, and uses `disk` to
+  create a corresponding file with the string as its contents. If `disk` returns
+  an error then `hashFileWriter` returns that error.
+
+Sprucing up our previous example to use these more clearly defined components
+might look like:
+
+```go
+package main
+
+import (
+	"crypto/sha1"
+	"encoding/hex"
+	"fmt"
+	"io"
+	"io/ioutil"
+	"os"
+)
+
+// Disk defines the methods of the disk component.
+type Disk interface {
+	WriteFile(fileName string, fileContents []byte) error
+}
+
+// disk is the primary implementation of Disk. It implements the methods of
+// Disk (WriteFile) by performing actual system calls.
+type disk struct{}
+
+func NewDisk() Disk { return disk{} }
+
+func (disk) WriteFile(fileName string, fileContents []byte) error {
+	return ioutil.WriteFile(fileName, fileContents, 0644)
+}
+
+func hashFileWriter(stdin io.Reader, disk Disk) error {
+	h := sha1.New()
+	r := io.TeeReader(stdin, h)
+	body, err := ioutil.ReadAll(r)
+	if err != nil {
+		return fmt.Errorf("reading input: %w", err)
+	}
+
+	fileName := hex.EncodeToString(h.Sum(nil))
+
+	if err := disk.WriteFile(fileName, body); err != nil {
+		return fmt.Errorf("writing to file %q: %w", fileName, err)
+	}
+	return nil
+}
+
+func main() {
+	if err := hashFileWriter(os.Stdin, NewDisk()); err != nil {
+		panic(err) // consider the error handled
+	}
+}
+```
+
+`hashFileWriter` no longer directly uses `os.Stdin` and `ioutil.WriteFile`, but
+instead takes in components wrapping them; `io.Reader` is a built-in interface
+which `os.Stdin` inherently implements, and `Disk` is a simple interface defined
+just for this program.
+
+At first glance this would seem to have doubled the line-count for very little
+gain. This is because we have not yet written tests.
+
+## Testing
+
+As has already been firmly established, testing is important.
+
+In the second form of the program we can test the core-functionality of the
+`hashFileWriter` component without resorting to using the actual `stdin` and
+`disk` components. Instead we use mocks of those components. A mock component
+implements the same input/outputs that the "real" component does, but in a way
+which makes testing a particular component possible without reaching outside the
+process. These are unit tests.
+
+Tests for the latest form of the program might look like this:
+
+```go
+package main
+
+import (
+	"strings"
+	"testing"
+)
+
+// mockDisk implements the Disk interface. When WriteFile is called mockDisk
+// will pretend to write the file, but instead will simply store what arguments
+// WriteFile was called with.
+type mockDisk struct {
+	fileName     string
+	fileContents []byte
+}
+
+func (d *mockDisk) WriteFile(fileName string, fileContents []byte) error {
+	d.fileName = fileName
+	d.fileContents = fileContents
+	return nil
+}
+
+func TestHashFileWriter(t *testing.T) {
+	type test struct {
+		in          string
+		expFileName string
+		// expFileContents can be inferred from in
+	}
+
+	tests := []test{
+		{
+			in:          "",
+			expFileName: "da39a3ee5e6b4b0d3255bfef95601890afd80709",
+		},
+		{
+			in:          "hello",
+			expFileName: "aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d",
+		},
+		{
+			in:          "hello\nworld", // make sure newlines don't break things
+			expFileName: "7db827c10afc1719863502cf95397731b23b8bae",
+		},
+	}
+
+	for _, test := range tests {
+		// stdin is mocked via a strings.Reader, which outputs the string it was
+		// initialized with as a stream of bytes.
+		in := strings.NewReader(test.in)
+
+		// Disk is mocked by mockDisk, go figure.
+		disk := new(mockDisk)
+
+		if err := hashFileWriter(in, disk); err != nil {
+			t.Errorf("in:%q got err:%v", test.in, err)
+		} else if string(disk.fileContents) != test.in {
+			t.Errorf("in:%q got contents:%q", test.in, disk.fileContents)
+		} else if string(disk.fileName) != test.expFileName {
+			t.Errorf("in:%q got fileName:%q", test.in, disk.fileName)
+		}
+	}
+}
+```
+
+Notice that these tests do not _completely_ cover the desired functionality of
+the program: if `disk` returns an error that error should be returned from
+`hashFileWriter`. Whether or not this must be tested as well, and indeed the
+pedantry level of tests overall, is a matter of taste. I believe these to be
+sufficient.
+
+## Configuration
+
+Practically all programs require some level of runtime configuration. This may
+take the form of command-line arguments, environment variables, configuration
+files, etc. Almost all configuration methods will require some system call, and
+so any component accessing configuration directly would likely break component
+property 4.
+
+Instead each component should take in whatever configuration parameters it needs
+during instantiation, and let `main` handle collecting all configuration from
+outside of the process and instantiating the components appropriately.
+
+Let's take our previous program, but add in two new desired behaviors: first,
+there should be a command-line parameter which allows for specifying the string
+on the command-line, rather than reading from stdin, and second, there should be
+a command-line parameter declaring which directory to write files into. The new
+implementation looks like:
+
+```
+package main
+
+import (
+	"crypto/sha1"
+	"encoding/hex"
+	"flag"
+	"fmt"
+	"io"
+	"io/ioutil"
+	"os"
+	"path/filepath"
+	"strings"
+)
+
+// Disk defines the methods of the disk component.
+type Disk interface {
+	WriteFile(fileName string, fileContents []byte) error
+}
+
+// disk is the concrete implementation of Disk. It implements the methods of
+// Disk (WriteFile) by performing actual OS calls.
+type disk struct {
+	dir string
+}
+
+func NewDisk(dir string) Disk { return disk{dir: dir} }
+
+func (d disk) WriteFile(fileName string, fileContents []byte) error {
+	fileName = filepath.Join(d.dir, fileName)
+	return ioutil.WriteFile(fileName, fileContents, 0644)
+}
+
+func hashFileWriter(in io.Reader, disk Disk) error {
+	h := sha1.New()
+	r := io.TeeReader(in, h)
+	body, err := ioutil.ReadAll(r)
+	if err != nil {
+		return fmt.Errorf("reading input: %w", err)
+	}
+
+	fileName := hex.EncodeToString(h.Sum(nil))
+
+	if err := disk.WriteFile(fileName, body); err != nil {
+		return fmt.Errorf("writing to file %q: %w", fileName, err)
+	}
+	return nil
+}
+
+func main() {
+	str := flag.String("str", "", "If set, hash and write this string instead of stdin")
+	dir := flag.String("dir", ".", "Directory which files should be written to")
+	flag.Parse()
+
+	var in io.Reader
+	if *str == "" {
+		in = os.Stdin
+	} else {
+		in = strings.NewReader(*str)
+	}
+
+	disk := NewDisk(*dir)
+
+	if err := hashFileWriter(in, disk); err != nil {
+		panic(err) // consider the error handled
+	}
+}
+```
+
+Very little has changed, and in fact `hashFileWriter` was not touched at all,
+meaning all unit tests remained valid. 
+
+## Setup/Runtime/Cleanup
+
+A program can be split into three stages: setup, runtime, and cleanup. Setup
+is the stage during which internal state is assembled in order to make runtime
+possible. Runtime is the stage during which a program's actual function is being
+performed. Cleanup is the stage during which runtime stop and internal state is
+disassembled.
+
+A graceful (i.e. reliably correct) setup is quite natural to accomplish, but
+unfortunately a graceful cleanup is not a programmer's first concern, and
+frequently is not a concern at all. However, when building reliable and correct
+programs, a graceful cleanup is as important as a graceful setup and runtime. A
+program is still running while it is being cleaned up, and it's possibly even
+acting on the outside world still. Shouldn't it behave correctly during that
+time?
+
+Achieving a graceful setup and cleanup with components is quite simple:
+
+During setup a single-threaded process (usually `main`) will construct the
+"leaf" components (those which have no child components of their own) first,
+then the components which take those leaves as parameters, then the components
+which take _those_ as parameters, and so on, until all are constructed. The
+components end up assembled into a directed acyclic graph.
+
+At this point the program will begin runtime.
+
+Once runtime is over and it is time for the program to exit it's only necessary
+to call each component's cleanup method(s) in the reverse of the order the
+components were instantiated in. A component's cleanup method should not be
+called until all of its parent components have been cleaned up.
+
+Inherent to the pattern is the fact that each component will certainly be
+cleaned up before any of its child components, since its child components must
+have been instantiated first and a component will not clean up child components
+given as parameters (as-per component property 5a).
+
+With go this pattern can be achieved easily using `defer`, but writing it out
+manually is not so hard, as in this toy example:
+
+```
+package main
+
+import (
+	"fmt"
+	"time"
+)
+
+// sleeper is a component which prints its children and sleeps when it's time to
+// cleanup.
+type sleeper struct {
+	children []*sleeper
+	toSleep  time.Duration
+
+	// The builtin time.Sleep is an impure global function, a component can't
+	// use it, so the component must be instantiated with it as a parameter.
+	sleep func(time.Duration)
+
+	// likewise os.Stdout is a global singleton, and so must also be a
+	parameter.
+	stdout io.Writer
+}
+
+func (s *sleeper) print() {
+	fmt.Fprintf(s.stdout, "I will sleep for %v\n", s.toSleep)
+	for _, child := range s.children {
+		child.print()
+	}
+}
+
+func (s *sleeper) cleanup() {
+	s.sleep(s.toSleep)
+	fmt.Fprintf(s.stdout, "I slept for %v\n", s.toSleep)
+}
+
+func main() {
+
+	// Within main we make a helper function to easily construct sleepers. for a
+	// toy like this it's not worth the effort of giving sleeper a real
+	// initialization function.
+	newSleeper := func(toSleep time.Duration, children ...*sleeper) *sleeper {
+		return &sleeper{
+			children: children,
+			toSleep:  toSleep,
+			sleep:    time.Sleep,
+			stdout:   os.Stdout,
+		}
+	}
+
+	aa := newSleeper(250 * time.Millisecond)
+	defer aa.cleanup()
+
+	ab := newSleeper(250 * time.Millisecond)
+	defer ab.cleanup()
+
+	// A's children are AA and AB
+	a := newSleeper(500*time.Millisecond, aa, ab)
+	defer a.cleanup()
+
+	b := newSleeper(750 * time.Millisecond)
+	defer b.cleanup()
+
+	// root's children are A and B
+	root := newSleeper(1*time.Second, a, b)
+	defer root.cleanup()
+
+	// All components are now instantiated and runtime begins.
+	root.print()
+    // ... and just like that, runtime ends.
+	fmt.Println("--- Alright, fun is over, time for bed ---")
+
+	// Now to clean up, cleanup methods are called in the reverse order of the
+	// component's instantiation.
+	root.cleanup()
+	b.cleanup()
+	a.cleanup()
+	ab.cleanup()
+	aa.cleanup()
+
+	// Expected output is:
+	//
+	// I will sleep for 1s
+	// I will sleep for 500ms
+	// I will sleep for 250ms
+	// I will sleep for 250ms
+	// I will sleep for 750ms
+	// --- Alright, fun is over, time for bed ---
+	// I slept for 1s
+	// I slept for 750ms
+	// I slept for 500ms
+	// I slept for 250ms
+	// I slept for 250ms
+}
+```
+
+## Criticisms
+
+In lieu of a FAQ I will attempt to premeditate criticisms of the component
+oriented pattern laid out in this post:
+
+*This seems like a lot of extra work.*
+
+Building reliable programs is a lot of work, just as building reliable-anything
+is a lot of work. Many of us work in an industry which likes to balance
+reliability (sometimes referred to by the more specious "quality") with
+maleability and deliverability, which naturally leads to skepticism of any
+suggestions which require more time spent on reliability. This is not
+necessarily a bad thing, it's just how the industry functions.
+
+All that said, a pattern need not be followed perfectly to be worthwhile, and
+the amount of extra work incurred by it can be decided based on practical
+considerations. I merely maintain that when it comes time to revisit some
+existing code, either to fix or augment it, that the job will be notably easier
+if the code _mostly_ follows this pattern.
+
+*My language makes this difficult.*
+
+I don't know of any language which makes this pattern particularly easy, so
+unfortunately we're all in the same boat to some extent (though I recognize that
+some languages, or their ecosystems, make it more difficult than others). It
+seems to me that this pattern shouldn't be unbearably difficult for anyone to
+implement in any language either, however, as the only language feature needed
+is abstract typing.
+
+It would be nice to one day see a language which explicitly supported this
+pattern by baking the component properties in as compiler checked rules.
+
+*This will result in over-abstraction.*
+
+Abstraction is a necessary tool in a programmer's toolkit, there is simply no
+way around it. The only questions are "how much?" and "where?".
+
+The use of this pattern does not effect how those questions are answered, but
+instead aims to more clearly delineate the relationships and interactions
+between the different abstracted types once they've been established using other
+methods. Over-abstraction is the fault of the programmer, not the language or
+pattern or framework.
+
+*The acronymn is CoP.*
+
+Why do you think I've just been ackwardly using "this pattern" instead of the
+acronymn for the whole post? Better names are welcome.
+
+## Conclusion
+
+The component oriented pattern helps make our code more reliable with only a
+small amount of extra effort incurred. In fact most of the pattern has to do
+establishing sensible abstractions around global functionality and remembering
+certain idioms for how those abstractions should be composed together, something
+most of us do to some extent already anyway.
+
+While beneficial in many ways, component oriented programming is merely a tool
+which can be applied in many cases. It is certain that there are cases where it
+is not the right tool for the job. I've found these cases to be
+few-and-far-between, however. It's a solid pattern that I've gotten good use out
+of, and hopefully you'll find it, or some parts of it, to be useful as well.