ginger: it's alive

2021-12-31 13:37:32 -07:00 · 2021-12-31 13:37:32 -07:00 · 19f1efd748
commit 19f1efd748
parent ed4d179680
1 changed files with 422 additions and 0 deletions
--- a/static/src/_posts/2021-12-31-ginger-its-alive.md
+++ b/static/src/_posts/2021-12-31-ginger-its-alive.md
@ -0,0 +1,422 @@
+---
+title: >-
+    Ginger: It's Alive!
+description: >-
+    The new best language for computing fibonacci numbers.
+series: ginger
+tags: tech
+---
+
+As a kind of Christmas present to myself I took a whole week off of work
+specifically to dedicate myself to working on ginger.
+
+My concrete goal was to be able to run a ginger program to compute any Nth
+fibonacci number, a goal I chose because it would require the implementation of
+conditionals, some kind of looping or recursion, and basic addition/subtraction.
+In other words, it would require all the elements which comprise a Turing
+complete language.
+
+And you know what? I actually succeeded!
+
+The implementation can be found [here][impl]. At this point ginger is an
+interpreted language running in a golang-based VM. The dream is for it to be
+self-hosted on LLVM (and other platforms after), but as an intermediate step to
+that I decided on sticking to what I know (golang) rather than having to learn
+two things at once.
+
+In this post I'm going to describe the components of this VM at a high level,
+show a quick demo of it working, and finally talk about the roadmap going
+forward.
+
+[impl]: https://github.com/mediocregopher/ginger/tree/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2
+
+## Graph
+
+The core package of the whole project is the [`graph`][graph] package. This
+package implements a generic directed graph datastructure.
+
+The generic part is worth noting; I was able to take advantage of go's new
+generics which are currently [in beta][go118]. I'd read quite a bit on how the
+generic system would work even before the beta was announced, so I was able to
+hit the ground running and start using them without much issue.
+
+Ginger's unique graph datastructure has been discussed in previous posts in this
+series quite a bit, and this latest implementation doesn't deviate much at a
+high level. Below are the most up-to-date core datatypes and functions which are
+used to construct ginger graphs:
+
+```go
+
+// Value is any value which can be stored within a Graph. Values should be
+// considered immutable, ie once used with the graph package their internal
+// value does not change.
+type Value interface {
+	Equal(Value) bool
+	String() string
+}
+
+// OpenEdge consists of the edge value (E) and source vertex value (V) of an
+// edge in a Graph. When passed into the AddValueIn method a full edge is
+// created. An OpenEdge can also be sourced from a tuple vertex, whose value is
+// an ordered set of OpenEdges of this same type.
+type OpenEdge[E, V Value] struct { ... }
+
+// ValueOut creates a OpenEdge which, when used to construct a Graph, represents
+// an edge (with edgeVal attached to it) coming from the vertex containing val.
+func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V]
+
+// TupleOut creates an OpenEdge which, when used to construct a Graph,
+// represents an edge (with edgeVal attached to it) coming from the vertex
+// comprised of the given ordered-set of input edges.
+func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V]
+
+// Graph is an immutable container of a set of vertices. The Graph keeps track
+// of all Values which terminate an OpenEdge. E indicates the type of edge
+// values, while V indicates the type of vertex values.
+type Graph[E, V Value] struct { ... }
+
+// AddValueIn takes a OpenEdge and connects it to the Value vertex containing
+// val, returning the new Graph which reflects that connection.
+func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V]
+
+// ValueIns returns, if any, all OpenEdges which lead to the given Value in the
+// Graph (ie, all those added via AddValueIn).
+func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V]
+
+```
+
+The current `Graph` implementation is _incredibly_ inefficient, it does a lot of
+copying, looping, and equality checks which could be optimized out one day.
+That's going to be a recurring theme of this post, as I had to perform a
+balancing act between actually reaching my goal for the week while not incurring
+too much tech debt for myself.
+
+[graph]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go
+[go118]: https://go.dev/blog/go1.18beta1
+
+### MapReduce
+
+There's a final operation I implemented as part of the `graph` package:
+[MapReduce][mapreduce]. It's a difficult operation to describe, but I'm going to
+do my best in this section for those who are interested. If you don't understand
+it, or don't care, just know that `MapReduce` is a generic tool for transforming
+graphs.
+
+For a description of `MapReduce` we need to present an example graph:
+
+```
+        +<--b---
+        +       \
+X <--a--+<--c----+<--f-- A
+        +               /
+        +      +<---g---
+        +<--d--+
+               +<---h---
+                        \
+Y <---------e----------- B
+```
+
+Plus signs indicate tuples, and lowercase letters are edge values while upper
+case letters are vertex values. The pseudo-code to construct this graph in go
+might look like:
+
+```go
+    g := new(Graph)
+
+    fA := ValueOut("f", "A")
+
+    g = g.AddValueIn(
+        "X",
+        TupleOut(
+            "a",
+            TupleOut("b", fA),
+            TupleOut("c", fA),
+            TupleOut(
+                "d",
+                ValueOut("g", "A"),
+                ValueOut("h", "B"),
+            ),
+        ),
+    )
+
+    g = g.AddValueIn("e", "B")
+```
+
+As can be seen in the [code][mapreduce], `MapReduce`'s first argument is an
+`OpenEdge`, _not_ a `Graph`. Fundamentally `MapReduce` is a reduction of the
+_dependencies_ of a particular value into a new value; to reduce the
+dependencies of multiple values at the same time would be equivalent to looping
+over those values and calling `MapReduce` on each individually. Having
+`MapReduce` only deal with one edge at a time is more flexible.
+
+So let's focus on a particular `OpenEdge`, the one leading into `X` (returned by
+`TupleOut("a", etc...)`. `MapReduce` is going to descend into this `OpenEdge`
+recursively, in order to first find all value vertices (ie the leaf vertices,
+those without any children of their own).
+
+At this point `MapReduce` will use its second argument, the `mapVal` function,
+which accepts a value of one type and returns a value of another type. This
+function is called on each value from every value vertex encountered. In this
+case both `A` and `B` are connectable from `X`, so `mapVal` will be called on
+each _only once_. This is the case even though `A` is connected to multiple
+times (once with an edge value of `f`, another with an edge value of `b`).
+`mapVal` only gets called once per vertex, not per connection.
+
+With all values mapped, `MapReduce` will begin reducing. For each edge leaving
+each value vertex, the `reduceEdge` function is called. `reduceEdge` accepts as
+arguments the edge value of the edge and the _mapped value_ (not the original
+value) of the vertex, and returns a new value of the same type that `mapVal`
+returned. Like `mapVal`, `reduceEdge` will only be called once per edge. In our
+example, `<--f--A` is used twice (`b` and `c`), but `reduceEdge` will only be
+called on it once.
+
+With each value vertex edge having been reduced, `reduceEdge` is called again on
+each edge leaving _those_ edges, which must be tuple edges. An array of the
+values returned from the previous `reduceEdge` calls for each of the tuples'
+input edges is used as the value argument in the next call. This is done until
+the `OpenEdge` is fully reduced into a single value.
+
+To flesh out our example, let's imagine a `mapVal` which returns the input
+string repeated twice, and a `reduceEdge` which returns the input values joined
+with the edge value, and then wrapped with the edge value (eg `reduceEdge(a, [B,
+C]) -> aBaCa`).
+
+Calling `MapReduce` on the edge leading into `X` will then give us the following
+calls:
+
+```
+# Map the value vertices
+
+mapVal(A) -> AA
+mapVal(B) -> BB
+
+# Reduce the value vertex edges
+
+reduceEdge(f, [AA]) -> fAAf
+reduceEdge(g, [AA]) -> gAAg
+reduceEdge(h, [BB]) -> hBBh
+
+# Reduce tuple vertex edges
+
+reduceEdge(b, [fAAf]) -> bfAAfb
+reduceEdge(c, [fAAf]) -> cfAAfc
+reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd
+
+reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda
+```
+
+Beautiful, exactly what we wanted.
+
+`MapReduce` will prove extremely useful when it comes time for the VM to execute
+the graph. It enables the VM to evaluate only the values which are needed to
+produce an output, and to only evaluate each value once no matter how many times
+it's used. `MapReduce` also takes care of the recursive traversal of the
+`Graph`, which simplifies the VM code significantly.
+
+[mapreduce]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go#L338
+
+## gg
+
+With a generic graph implementation out of the way, it was then required to
+define a specific implementation which could be parsed from a file and later
+used for execution in the VM.
+
+The file extension used for ginger code is `.gg`, as in "ginger graph" (of
+course). The package name for decoding this file format is, therefore, also
+called `gg`.
+
+The core datatype for the `gg` package is the [`Value`][ggvalue], since the
+`graph` package takes care of essentially everything else in the realm of graph
+construction and manipulation. The type definition is:
+
+```go
+// Value represents a value which can be serialized by the gg text format.
+type Value struct {
+
+	// Only one of these fields may be set
+	Name   *string
+	Number *int64
+	Graph  *Graph
+
+	// Optional fields indicating the token which was used to construct this
+	// Value, if any.
+	LexerToken *LexerToken
+}
+
+type Graph = graph.Graph[Value, Value] // type alias for convenience
+```
+
+Note that it's currently only possible to describe three different types in a
+`gg` file, and one of them is the `Graph`! These are the only ones needed to
+implement a fibonacci function, so they're all I implemented.
+
+The lexing/parsing of `gg` files is not super interesting, you can check out the
+package code for more details. The only other thing worth noting is that, for
+now, all statements are required to end with a `;`. I had originally wanted to
+be less strict with this, and allow newlines and other tokens to indicate the
+end of statements, but it was complicating the code and I wanted to move on.
+
+Another small thing worth noting is that I decided to make each entire `.gg`
+file implicitly define a graph. So you can imagine each file's contents wrapped
+in curly braces.
+
+With the `gg` package out of the way I was able to finally parse ginger
+programs! The following is the actual, real-life implementation of the fibonacci
+function (though at this point it didn't actually work, because the VM was still
+not implemented:
+
+```
+out = {
+
+    decr = { out = add < (in; -1;); };
+
+    n = tupEl < (in; 0;);
+    a = tupEl < (in; 1;);
+    b = tupEl < (in; 2;);
+
+    out = if < (
+        isZero < n;
+        a;
+        recur < (
+            decr < n;
+            b;
+            add < (a;b;);
+        );
+    );
+
+} < (in; 0; 1;);
+```
+
+[ggvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/gg/gg.go#L14
+
+## VM
+
+Finally, the meat of all this. If the `graph` and `gg` packages are the sturdy,
+well constructed foundations of a tall building, then the `vm` package is the
+extremely long, flimsy stick someone propped up vertically so they could say
+they built a structure of impressive height.
+
+In other words, it's very likely that the current iteration of the VM will not
+be long for this world, and so I won't waste time describing it in super detail.
+
+What I will say about it is that within the `vm` package I've defined a [new
+`Value` type][vmvalue], which extends the one defined in `gg`. The necessity of
+this was that there are types which cannot be represented syntactically in a
+`.gg` file, but which _can_ be used as values within a program being run.
+
+The first of these is the `Operation`, which is essentially a first-class
+function. The VM will automatically interpret a graph as an `Operation` when it
+is used as an edge value, as has been discussed in previous posts, but there are
+also built-in operations (like `if` and `recur`) which cannot be represented as
+datastructures, and so it was necessary to introduce a new in-memory type to
+properly represent operations.
+
+The second is the `Tuple` type. This may seem strange, as ginger graphs already
+have a concept of a tuple. But the ginger graph tuple is a _vertex type_, not a
+value type. The distinction is small, but important. Essentially the graph tuple
+is a structural element which describes how to create a tuple value, but it is
+not yet that value. So we need a new Value type to hold the tuple once it _has_
+been created during runtime.
+
+Another thing worth describing about the `vm` package, even though I think they
+might change drastically, are [`Thunk`s][thunk]:
+
+```go
+// Thunk is returned from the performance of an Operation. When called it will
+// return the result of that Operation having been called with the particular
+// arguments which were passed in.
+type Thunk func() (Value, error)
+```
+
+The term "thunk" is borrowed from Haskell, which I don't actually know so I'm
+probably using it wrong, but anyway...
+
+A thunk is essentially a value which has yet to be evaluated; the VM knows
+exactly _how_ to evaluate it, but it hasn't done so yet. The primary reason for
+their existence within ginger is to account for conditionals, ie the `if`
+operation. The VM can't evaluate each of an `if`'s arguments all at once, it
+must only evaluate the first argument (to obtain a boolean), and then based on
+that evaluate the second or third argument.
+
+This is where `graph.MapReduce` comes in. The VM uses `graph.MapReduce` to
+reduce each edge in a graph to a `Thunk`, where the `Thunk`'s value is based on
+the operation (the edge's value) and the inputs to the edge (which will
+themselves be `Thunk`s). Because each `Thunk` represents a potential value, not
+an actual one, the VM is able to completely parse the program to be executed
+(using `graph.MapReduce`) while allowing conditionals to still work correctly.
+
+[EvaluateEdge][evaledge] is where all that happens, if you're interested, but be
+warned that the code is a hot mess right now and it's probably not worth
+spending a ton of time understanding it as it will change a lot.
+
+A final thing I'll mention is that the `recur` operation is, I think, broken. Or
+probably more accurately, the entire VM is broken in a way which prevents
+`recur` from working correctly. It _does_ produce the correct output, so I
+haven't prioritized debugging it, but for any large number of iterations it
+takes a very long time to run.
+
+[vmvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/vm.go#L18
+[thunk]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/op.go#L11
+[evaledge]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/scope.go#L29
+
+## Demo
+
+Finally, to show it off! I put together a super stupid `eval` binary which takes
+two arguments: a graph to be used as an operation, and a value to be used as an
+argument to that operation. It doesn't even read the code from a file, you have
+to `cat` it in.
+
+The [README][readme] documents how to run the demo, so if you'd like to do so
+then please clone the repo and give it a shot! It should look like this when you
+do:
+
+```
+# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8
+21
+```
+
+You can put any number you like instead of `8`, but as mentioned, `recur` is
+broken so it can take a while for larger numbers.
+
+[readme]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/README.md
+
+## Next Steps
+
+The following are all the things I'd like to address the next time I work on
+ginger:
+
+* `gg`
+
+    * Allow for newlines (and `)` and `}`) to terminate statements, not just
+      `;`.
+
+    * Allow names to have punctuation characters in them (maybe?).
+
+    * Don't read all tokens into memory prior to parsing.
+
+* `vm`
+
+    * Fix `recur`.
+
+    * Implement tail call optimization.
+
+* General
+
+    * A bit of polish on the `eval` tool.
+
+    * Expose graph creation, traversal, and transformation functions as
+      builtins.
+
+    * Create plan (if not actually implement it yet) for how code will be
+      imported from one file to another. Namespacing in general will fall into
+      this bucket.
+
+    * Create plan (if not actually implement it yet) for how users can
+      extend/replace the lexer/parser.
+
+I don't know _when_ I'll get to work on these next, ginger will come back up in
+my rotation of projects eventually. It could be a few months. In the meantime I
+hope you're as excited about this progress as I am, and if you have any feedback
+I'd love to hear it.
+
+Thanks for reading!