mediocre-blog/2020-11-16-component-oriented-programming.md at dbf6ebdeee558fee27f97b7b0f69ff25f87244e8

Brian Picciano dbf6ebdeee component oriented programming post

2020-11-18 22:13:53 -07:00

18 KiB

Raw Blame History

title	description
Component Oriented Programming	A concise description of.

A previous post in this blog focused on a framework developed to make designing component-based programs easier. In retrospect pattern/framework proposed was over-engineered; this post attempts to present the same ideas but in a more distilled form, as a simple programming pattern and without the unnecessary framework.

Nothing in this post will be revelatory; it's surely all been said before. But hopefully the form it takes here will be useful to someone, as it would have been useful to myself when I first learned to program.

Axioms

For the sake of brevity let's assume the following: within the context of single-process (not the same as single-threaded), non-graphical programs the following may be said:

A program may be thought of as a black-box with certain input and output methods. It is the programmer's task to construct a program such that specific inputs yield specific desired outputs.
A program is not complete without sufficient testing to prove it's complete.
Global state and global impure functions makes testing more difficult. This can include singletons and system calls.

Any of these may be argued, but that will be left for other posts. Any of these may be said of other types of programs as well, but that can also be left for other posts.

Components

Properties of components include:

Creatable: An instance of a component, given some defined set of parameters, can be created independently of any other instance of that or any other component.
Composable: A component may be used as a parameter of another component's instantiation. This would make it a child component of the one being instantiated (i.e. the parent).
Abstract: A component is an interface consisting of one or more methods. Being an interface, a component may have one or more implementations, but generally will have a primary implementation, which is used during a program's runtime, and secondary "mock" implementations, which are only used when testing other components.
Isolated: A component may not use mutable global variables (i.e. singletons) or impure global functions (e.g. system calls). It may only use constants and variables/components given to it during instantiation.
Ephemeral: A component may have a specific method used to clean up all resources that it's holding (e.g. network connections, file handles, language-specific lightweight threads, etc).

5a. This cleanup method should not clean up any child components given as instantiation parameters.

5b. This cleanup method should not return until the component's cleanup is complete.

Components are composed together to create programs. This is done by passing components as parameters to other components during instantiation. The main process of the program is responsible for instantiating and composing most, if not all, components in the program.

A component oriented program is one which primarily, if not entirely, uses components for its functionality. Components generally have the quality of being able to interact with code written in other patterns without any toes being stepped on.

Example

Let's start with an example: suppose a program is desired which accepts a string over stdin, hashes it, then writes the string to a file whose name is the hash.

A naive implementation of this program in go might look like:

package main

import (
	"crypto/sha1"
	"encoding/hex"
	"io"
	"io/ioutil"
	"os"
)

func hashFileWriter() error {
	h := sha1.New()
	r := io.TeeReader(os.Stdin, h)
	body, _ := ioutil.ReadAll(r)
	fileName := hex.EncodeToString(h.Sum(nil))

	if err := ioutil.WriteFile(fileName, body, 0644); err != nil {
		return err
	}

	return nil
}

func main() {
	if err := hashFileWriter(); err != nil {
		panic(err) // consider the error handled
	}
}

Notice that there's not a clear separation here between different components; hashFileWriter might be considered a one method component, except that it breaks component property 4, which says that a component may not use mutable global variables (os.Stdin) or impure global functions (ioutil.WriteFile).

Notice also that testing the program would require integration tests, and could not be unit tested (because there are no units, i.e. components). For a trivial program like this one writing unit and integration tests would be redundant, but for larger programs it may not be. Unit tests are important because they are fast to run, (usually) easy to formulate, and yield consistent results.

This program could instead be written as being composed of three components:

stdin, a construct given by the runtime which outputs a stream of bytes.
disk, accepts a file name and file contents as input, writes the file contents to a file of the given name, and potentially returns an error back.
hashFileWriter, reads a stream of bytes off a stdin, collects the stream into a string, hashes that string to generate a file name, and uses disk to create a corresponding file with the string as its contents. If disk returns an error then hashFileWriter returns that error.

Sprucing up our previous example to use these more clearly defined components might look like:

package main

import (
	"crypto/sha1"
	"encoding/hex"
	"fmt"
	"io"
	"io/ioutil"
	"os"
)

// Disk defines the methods of the disk component.
type Disk interface {
	WriteFile(fileName string, fileContents []byte) error
}

// disk is the primary implementation of Disk. It implements the methods of
// Disk (WriteFile) by performing actual system calls.
type disk struct{}

func NewDisk() Disk { return disk{} }

func (disk) WriteFile(fileName string, fileContents []byte) error {
	return ioutil.WriteFile(fileName, fileContents, 0644)
}

func hashFileWriter(stdin io.Reader, disk Disk) error {
	h := sha1.New()
	r := io.TeeReader(stdin, h)
	body, err := ioutil.ReadAll(r)
	if err != nil {
		return fmt.Errorf("reading input: %w", err)
	}

	fileName := hex.EncodeToString(h.Sum(nil))

	if err := disk.WriteFile(fileName, body); err != nil {
		return fmt.Errorf("writing to file %q: %w", fileName, err)
	}
	return nil
}

func main() {
	if err := hashFileWriter(os.Stdin, NewDisk()); err != nil {
		panic(err) // consider the error handled
	}
}

hashFileWriter no longer directly uses os.Stdin and ioutil.WriteFile, but instead takes in components wrapping them; io.Reader is a built-in interface which os.Stdin inherently implements, and Disk is a simple interface defined just for this program.

At first glance this would seem to have doubled the line-count for very little gain. This is because we have not yet written tests.

Testing

As has already been firmly established, testing is important.

In the second form of the program we can test the core-functionality of the hashFileWriter component without resorting to using the actual stdin and disk components. Instead we use mocks of those components. A mock component implements the same input/outputs that the "real" component does, but in a way which makes testing a particular component possible without reaching outside the process. These are unit tests.

Tests for the latest form of the program might look like this:

package main

import (
	"strings"
	"testing"
)

// mockDisk implements the Disk interface. When WriteFile is called mockDisk
// will pretend to write the file, but instead will simply store what arguments
// WriteFile was called with.
type mockDisk struct {
	fileName     string
	fileContents []byte
}

func (d *mockDisk) WriteFile(fileName string, fileContents []byte) error {
	d.fileName = fileName
	d.fileContents = fileContents
	return nil
}

func TestHashFileWriter(t *testing.T) {
	type test struct {
		in          string
		expFileName string
		// expFileContents can be inferred from in
	}

	tests := []test{
		{
			in:          "",
			expFileName: "da39a3ee5e6b4b0d3255bfef95601890afd80709",
		},
		{
			in:          "hello",
			expFileName: "aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d",
		},
		{
			in:          "hello\nworld", // make sure newlines don't break things
			expFileName: "7db827c10afc1719863502cf95397731b23b8bae",
		},
	}

	for _, test := range tests {
		// stdin is mocked via a strings.Reader, which outputs the string it was
		// initialized with as a stream of bytes.
		in := strings.NewReader(test.in)

		// Disk is mocked by mockDisk, go figure.
		disk := new(mockDisk)

		if err := hashFileWriter(in, disk); err != nil {
			t.Errorf("in:%q got err:%v", test.in, err)
		} else if string(disk.fileContents) != test.in {
			t.Errorf("in:%q got contents:%q", test.in, disk.fileContents)
		} else if string(disk.fileName) != test.expFileName {
			t.Errorf("in:%q got fileName:%q", test.in, disk.fileName)
		}
	}
}

Notice that these tests do not completely cover the desired functionality of the program: if disk returns an error that error should be returned from hashFileWriter. Whether or not this must be tested as well, and indeed the pedantry level of tests overall, is a matter of taste. I believe these to be sufficient.

Configuration

Practically all programs require some level of runtime configuration. This may take the form of command-line arguments, environment variables, configuration files, etc. Almost all configuration methods will require some system call, and so any component accessing configuration directly would likely break component property 4.

Instead each component should take in whatever configuration parameters it needs during instantiation, and let main handle collecting all configuration from outside of the process and instantiating the components appropriately.

Let's take our previous program, but add in two new desired behaviors: first, there should be a command-line parameter which allows for specifying the string on the command-line, rather than reading from stdin, and second, there should be a command-line parameter declaring which directory to write files into. The new implementation looks like:

package main

import (
	"crypto/sha1"
	"encoding/hex"
	"flag"
	"fmt"
	"io"
	"io/ioutil"
	"os"
	"path/filepath"
	"strings"
)

// Disk defines the methods of the disk component.
type Disk interface {
	WriteFile(fileName string, fileContents []byte) error
}

// disk is the concrete implementation of Disk. It implements the methods of
// Disk (WriteFile) by performing actual OS calls.
type disk struct {
	dir string
}

func NewDisk(dir string) Disk { return disk{dir: dir} }

func (d disk) WriteFile(fileName string, fileContents []byte) error {
	fileName = filepath.Join(d.dir, fileName)
	return ioutil.WriteFile(fileName, fileContents, 0644)
}

func hashFileWriter(in io.Reader, disk Disk) error {
	h := sha1.New()
	r := io.TeeReader(in, h)
	body, err := ioutil.ReadAll(r)
	if err != nil {
		return fmt.Errorf("reading input: %w", err)
	}

	fileName := hex.EncodeToString(h.Sum(nil))

	if err := disk.WriteFile(fileName, body); err != nil {
		return fmt.Errorf("writing to file %q: %w", fileName, err)
	}
	return nil
}

func main() {
	str := flag.String("str", "", "If set, hash and write this string instead of stdin")
	dir := flag.String("dir", ".", "Directory which files should be written to")
	flag.Parse()

	var in io.Reader
	if *str == "" {
		in = os.Stdin
	} else {
		in = strings.NewReader(*str)
	}

	disk := NewDisk(*dir)

	if err := hashFileWriter(in, disk); err != nil {
		panic(err) // consider the error handled
	}
}

Very little has changed, and in fact hashFileWriter was not touched at all, meaning all unit tests remained valid.

Setup/Runtime/Cleanup

A program can be split into three stages: setup, runtime, and cleanup. Setup is the stage during which internal state is assembled in order to make runtime possible. Runtime is the stage during which a program's actual function is being performed. Cleanup is the stage during which runtime stop and internal state is disassembled.

A graceful (i.e. reliably correct) setup is quite natural to accomplish, but unfortunately a graceful cleanup is not a programmer's first concern, and frequently is not a concern at all. However, when building reliable and correct programs, a graceful cleanup is as important as a graceful setup and runtime. A program is still running while it is being cleaned up, and it's possibly even acting on the outside world still. Shouldn't it behave correctly during that time?

Achieving a graceful setup and cleanup with components is quite simple:

During setup a single-threaded process (usually main) will construct the "leaf" components (those which have no child components of their own) first, then the components which take those leaves as parameters, then the components which take those as parameters, and so on, until all are constructed. The components end up assembled into a directed acyclic graph.

At this point the program will begin runtime.

Once runtime is over and it is time for the program to exit it's only necessary to call each component's cleanup method(s) in the reverse of the order the components were instantiated in. A component's cleanup method should not be called until all of its parent components have been cleaned up.

Inherent to the pattern is the fact that each component will certainly be cleaned up before any of its child components, since its child components must have been instantiated first and a component will not clean up child components given as parameters (as-per component property 5a).

With go this pattern can be achieved easily using defer, but writing it out manually is not so hard, as in this toy example:

package main

import (
	"fmt"
	"time"
)

// sleeper is a component which prints its children and sleeps when it's time to
// cleanup.
type sleeper struct {
	children []*sleeper
	toSleep  time.Duration

	// The builtin time.Sleep is an impure global function, a component can't
	// use it, so the component must be instantiated with it as a parameter.
	sleep func(time.Duration)

	// likewise os.Stdout is a global singleton, and so must also be a
	parameter.
	stdout io.Writer
}

func (s *sleeper) print() {
	fmt.Fprintf(s.stdout, "I will sleep for %v\n", s.toSleep)
	for _, child := range s.children {
		child.print()
	}
}

func (s *sleeper) cleanup() {
	s.sleep(s.toSleep)
	fmt.Fprintf(s.stdout, "I slept for %v\n", s.toSleep)
}

func main() {

	// Within main we make a helper function to easily construct sleepers. for a
	// toy like this it's not worth the effort of giving sleeper a real
	// initialization function.
	newSleeper := func(toSleep time.Duration, children ...*sleeper) *sleeper {
		return &sleeper{
			children: children,
			toSleep:  toSleep,
			sleep:    time.Sleep,
			stdout:   os.Stdout,
		}
	}

	aa := newSleeper(250 * time.Millisecond)
	defer aa.cleanup()

	ab := newSleeper(250 * time.Millisecond)
	defer ab.cleanup()

	// A's children are AA and AB
	a := newSleeper(500*time.Millisecond, aa, ab)
	defer a.cleanup()

	b := newSleeper(750 * time.Millisecond)
	defer b.cleanup()

	// root's children are A and B
	root := newSleeper(1*time.Second, a, b)
	defer root.cleanup()

	// All components are now instantiated and runtime begins.
	root.print()
    // ... and just like that, runtime ends.
	fmt.Println("--- Alright, fun is over, time for bed ---")

	// Now to clean up, cleanup methods are called in the reverse order of the
	// component's instantiation.
	root.cleanup()
	b.cleanup()
	a.cleanup()
	ab.cleanup()
	aa.cleanup()

	// Expected output is:
	//
	// I will sleep for 1s
	// I will sleep for 500ms
	// I will sleep for 250ms
	// I will sleep for 250ms
	// I will sleep for 750ms
	// --- Alright, fun is over, time for bed ---
	// I slept for 1s
	// I slept for 750ms
	// I slept for 500ms
	// I slept for 250ms
	// I slept for 250ms
}

Criticisms

In lieu of a FAQ I will attempt to premeditate criticisms of the component oriented pattern laid out in this post:

This seems like a lot of extra work.

Building reliable programs is a lot of work, just as building reliable-anything is a lot of work. Many of us work in an industry which likes to balance reliability (sometimes referred to by the more specious "quality") with maleability and deliverability, which naturally leads to skepticism of any suggestions which require more time spent on reliability. This is not necessarily a bad thing, it's just how the industry functions.

All that said, a pattern need not be followed perfectly to be worthwhile, and the amount of extra work incurred by it can be decided based on practical considerations. I merely maintain that when it comes time to revisit some existing code, either to fix or augment it, that the job will be notably easier if the code mostly follows this pattern.

My language makes this difficult.

I don't know of any language which makes this pattern particularly easy, so unfortunately we're all in the same boat to some extent (though I recognize that some languages, or their ecosystems, make it more difficult than others). It seems to me that this pattern shouldn't be unbearably difficult for anyone to implement in any language either, however, as the only language feature needed is abstract typing.

It would be nice to one day see a language which explicitly supported this pattern by baking the component properties in as compiler checked rules.

This will result in over-abstraction.

Abstraction is a necessary tool in a programmer's toolkit, there is simply no way around it. The only questions are "how much?" and "where?".

The use of this pattern does not effect how those questions are answered, but instead aims to more clearly delineate the relationships and interactions between the different abstracted types once they've been established using other methods. Over-abstraction is the fault of the programmer, not the language or pattern or framework.

The acronymn is CoP.

Why do you think I've just been ackwardly using "this pattern" instead of the acronymn for the whole post? Better names are welcome.

Conclusion

The component oriented pattern helps make our code more reliable with only a small amount of extra effort incurred. In fact most of the pattern has to do establishing sensible abstractions around global functionality and remembering certain idioms for how those abstractions should be composed together, something most of us do to some extent already anyway.

While beneficial in many ways, component oriented programming is merely a tool which can be applied in many cases. It is certain that there are cases where it is not the right tool for the job. I've found these cases to be few-and-far-between, however. It's a solid pattern that I've gotten good use out of, and hopefully you'll find it, or some parts of it, to be useful as well.

18 KiB Raw Blame History