ginger: it's alive

This commit is contained in:
Brian Picciano 2021-12-31 13:37:32 -07:00
parent ed4d179680
commit 19f1efd748

View File

@ -0,0 +1,422 @@
---
title: >-
Ginger: It's Alive!
description: >-
The new best language for computing fibonacci numbers.
series: ginger
tags: tech
---
As a kind of Christmas present to myself I took a whole week off of work
specifically to dedicate myself to working on ginger.
My concrete goal was to be able to run a ginger program to compute any Nth
fibonacci number, a goal I chose because it would require the implementation of
conditionals, some kind of looping or recursion, and basic addition/subtraction.
In other words, it would require all the elements which comprise a Turing
complete language.
And you know what? I actually succeeded!
The implementation can be found [here][impl]. At this point ginger is an
interpreted language running in a golang-based VM. The dream is for it to be
self-hosted on LLVM (and other platforms after), but as an intermediate step to
that I decided on sticking to what I know (golang) rather than having to learn
two things at once.
In this post I'm going to describe the components of this VM at a high level,
show a quick demo of it working, and finally talk about the roadmap going
forward.
[impl]: https://github.com/mediocregopher/ginger/tree/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2
## Graph
The core package of the whole project is the [`graph`][graph] package. This
package implements a generic directed graph datastructure.
The generic part is worth noting; I was able to take advantage of go's new
generics which are currently [in beta][go118]. I'd read quite a bit on how the
generic system would work even before the beta was announced, so I was able to
hit the ground running and start using them without much issue.
Ginger's unique graph datastructure has been discussed in previous posts in this
series quite a bit, and this latest implementation doesn't deviate much at a
high level. Below are the most up-to-date core datatypes and functions which are
used to construct ginger graphs:
```go
// Value is any value which can be stored within a Graph. Values should be
// considered immutable, ie once used with the graph package their internal
// value does not change.
type Value interface {
Equal(Value) bool
String() string
}
// OpenEdge consists of the edge value (E) and source vertex value (V) of an
// edge in a Graph. When passed into the AddValueIn method a full edge is
// created. An OpenEdge can also be sourced from a tuple vertex, whose value is
// an ordered set of OpenEdges of this same type.
type OpenEdge[E, V Value] struct { ... }
// ValueOut creates a OpenEdge which, when used to construct a Graph, represents
// an edge (with edgeVal attached to it) coming from the vertex containing val.
func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V]
// TupleOut creates an OpenEdge which, when used to construct a Graph,
// represents an edge (with edgeVal attached to it) coming from the vertex
// comprised of the given ordered-set of input edges.
func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V]
// Graph is an immutable container of a set of vertices. The Graph keeps track
// of all Values which terminate an OpenEdge. E indicates the type of edge
// values, while V indicates the type of vertex values.
type Graph[E, V Value] struct { ... }
// AddValueIn takes a OpenEdge and connects it to the Value vertex containing
// val, returning the new Graph which reflects that connection.
func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V]
// ValueIns returns, if any, all OpenEdges which lead to the given Value in the
// Graph (ie, all those added via AddValueIn).
func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V]
```
The current `Graph` implementation is _incredibly_ inefficient, it does a lot of
copying, looping, and equality checks which could be optimized out one day.
That's going to be a recurring theme of this post, as I had to perform a
balancing act between actually reaching my goal for the week while not incurring
too much tech debt for myself.
[graph]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go
[go118]: https://go.dev/blog/go1.18beta1
### MapReduce
There's a final operation I implemented as part of the `graph` package:
[MapReduce][mapreduce]. It's a difficult operation to describe, but I'm going to
do my best in this section for those who are interested. If you don't understand
it, or don't care, just know that `MapReduce` is a generic tool for transforming
graphs.
For a description of `MapReduce` we need to present an example graph:
```
+<--b---
+ \
X <--a--+<--c----+<--f-- A
+ /
+ +<---g---
+<--d--+
+<---h---
\
Y <---------e----------- B
```
Plus signs indicate tuples, and lowercase letters are edge values while upper
case letters are vertex values. The pseudo-code to construct this graph in go
might look like:
```go
g := new(Graph)
fA := ValueOut("f", "A")
g = g.AddValueIn(
"X",
TupleOut(
"a",
TupleOut("b", fA),
TupleOut("c", fA),
TupleOut(
"d",
ValueOut("g", "A"),
ValueOut("h", "B"),
),
),
)
g = g.AddValueIn("e", "B")
```
As can be seen in the [code][mapreduce], `MapReduce`'s first argument is an
`OpenEdge`, _not_ a `Graph`. Fundamentally `MapReduce` is a reduction of the
_dependencies_ of a particular value into a new value; to reduce the
dependencies of multiple values at the same time would be equivalent to looping
over those values and calling `MapReduce` on each individually. Having
`MapReduce` only deal with one edge at a time is more flexible.
So let's focus on a particular `OpenEdge`, the one leading into `X` (returned by
`TupleOut("a", etc...)`. `MapReduce` is going to descend into this `OpenEdge`
recursively, in order to first find all value vertices (ie the leaf vertices,
those without any children of their own).
At this point `MapReduce` will use its second argument, the `mapVal` function,
which accepts a value of one type and returns a value of another type. This
function is called on each value from every value vertex encountered. In this
case both `A` and `B` are connectable from `X`, so `mapVal` will be called on
each _only once_. This is the case even though `A` is connected to multiple
times (once with an edge value of `f`, another with an edge value of `b`).
`mapVal` only gets called once per vertex, not per connection.
With all values mapped, `MapReduce` will begin reducing. For each edge leaving
each value vertex, the `reduceEdge` function is called. `reduceEdge` accepts as
arguments the edge value of the edge and the _mapped value_ (not the original
value) of the vertex, and returns a new value of the same type that `mapVal`
returned. Like `mapVal`, `reduceEdge` will only be called once per edge. In our
example, `<--f--A` is used twice (`b` and `c`), but `reduceEdge` will only be
called on it once.
With each value vertex edge having been reduced, `reduceEdge` is called again on
each edge leaving _those_ edges, which must be tuple edges. An array of the
values returned from the previous `reduceEdge` calls for each of the tuples'
input edges is used as the value argument in the next call. This is done until
the `OpenEdge` is fully reduced into a single value.
To flesh out our example, let's imagine a `mapVal` which returns the input
string repeated twice, and a `reduceEdge` which returns the input values joined
with the edge value, and then wrapped with the edge value (eg `reduceEdge(a, [B,
C]) -> aBaCa`).
Calling `MapReduce` on the edge leading into `X` will then give us the following
calls:
```
# Map the value vertices
mapVal(A) -> AA
mapVal(B) -> BB
# Reduce the value vertex edges
reduceEdge(f, [AA]) -> fAAf
reduceEdge(g, [AA]) -> gAAg
reduceEdge(h, [BB]) -> hBBh
# Reduce tuple vertex edges
reduceEdge(b, [fAAf]) -> bfAAfb
reduceEdge(c, [fAAf]) -> cfAAfc
reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd
reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda
```
Beautiful, exactly what we wanted.
`MapReduce` will prove extremely useful when it comes time for the VM to execute
the graph. It enables the VM to evaluate only the values which are needed to
produce an output, and to only evaluate each value once no matter how many times
it's used. `MapReduce` also takes care of the recursive traversal of the
`Graph`, which simplifies the VM code significantly.
[mapreduce]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go#L338
## gg
With a generic graph implementation out of the way, it was then required to
define a specific implementation which could be parsed from a file and later
used for execution in the VM.
The file extension used for ginger code is `.gg`, as in "ginger graph" (of
course). The package name for decoding this file format is, therefore, also
called `gg`.
The core datatype for the `gg` package is the [`Value`][ggvalue], since the
`graph` package takes care of essentially everything else in the realm of graph
construction and manipulation. The type definition is:
```go
// Value represents a value which can be serialized by the gg text format.
type Value struct {
// Only one of these fields may be set
Name *string
Number *int64
Graph *Graph
// Optional fields indicating the token which was used to construct this
// Value, if any.
LexerToken *LexerToken
}
type Graph = graph.Graph[Value, Value] // type alias for convenience
```
Note that it's currently only possible to describe three different types in a
`gg` file, and one of them is the `Graph`! These are the only ones needed to
implement a fibonacci function, so they're all I implemented.
The lexing/parsing of `gg` files is not super interesting, you can check out the
package code for more details. The only other thing worth noting is that, for
now, all statements are required to end with a `;`. I had originally wanted to
be less strict with this, and allow newlines and other tokens to indicate the
end of statements, but it was complicating the code and I wanted to move on.
Another small thing worth noting is that I decided to make each entire `.gg`
file implicitly define a graph. So you can imagine each file's contents wrapped
in curly braces.
With the `gg` package out of the way I was able to finally parse ginger
programs! The following is the actual, real-life implementation of the fibonacci
function (though at this point it didn't actually work, because the VM was still
not implemented:
```
out = {
decr = { out = add < (in; -1;); };
n = tupEl < (in; 0;);
a = tupEl < (in; 1;);
b = tupEl < (in; 2;);
out = if < (
isZero < n;
a;
recur < (
decr < n;
b;
add < (a;b;);
);
);
} < (in; 0; 1;);
```
[ggvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/gg/gg.go#L14
## VM
Finally, the meat of all this. If the `graph` and `gg` packages are the sturdy,
well constructed foundations of a tall building, then the `vm` package is the
extremely long, flimsy stick someone propped up vertically so they could say
they built a structure of impressive height.
In other words, it's very likely that the current iteration of the VM will not
be long for this world, and so I won't waste time describing it in super detail.
What I will say about it is that within the `vm` package I've defined a [new
`Value` type][vmvalue], which extends the one defined in `gg`. The necessity of
this was that there are types which cannot be represented syntactically in a
`.gg` file, but which _can_ be used as values within a program being run.
The first of these is the `Operation`, which is essentially a first-class
function. The VM will automatically interpret a graph as an `Operation` when it
is used as an edge value, as has been discussed in previous posts, but there are
also built-in operations (like `if` and `recur`) which cannot be represented as
datastructures, and so it was necessary to introduce a new in-memory type to
properly represent operations.
The second is the `Tuple` type. This may seem strange, as ginger graphs already
have a concept of a tuple. But the ginger graph tuple is a _vertex type_, not a
value type. The distinction is small, but important. Essentially the graph tuple
is a structural element which describes how to create a tuple value, but it is
not yet that value. So we need a new Value type to hold the tuple once it _has_
been created during runtime.
Another thing worth describing about the `vm` package, even though I think they
might change drastically, are [`Thunk`s][thunk]:
```go
// Thunk is returned from the performance of an Operation. When called it will
// return the result of that Operation having been called with the particular
// arguments which were passed in.
type Thunk func() (Value, error)
```
The term "thunk" is borrowed from Haskell, which I don't actually know so I'm
probably using it wrong, but anyway...
A thunk is essentially a value which has yet to be evaluated; the VM knows
exactly _how_ to evaluate it, but it hasn't done so yet. The primary reason for
their existence within ginger is to account for conditionals, ie the `if`
operation. The VM can't evaluate each of an `if`'s arguments all at once, it
must only evaluate the first argument (to obtain a boolean), and then based on
that evaluate the second or third argument.
This is where `graph.MapReduce` comes in. The VM uses `graph.MapReduce` to
reduce each edge in a graph to a `Thunk`, where the `Thunk`'s value is based on
the operation (the edge's value) and the inputs to the edge (which will
themselves be `Thunk`s). Because each `Thunk` represents a potential value, not
an actual one, the VM is able to completely parse the program to be executed
(using `graph.MapReduce`) while allowing conditionals to still work correctly.
[EvaluateEdge][evaledge] is where all that happens, if you're interested, but be
warned that the code is a hot mess right now and it's probably not worth
spending a ton of time understanding it as it will change a lot.
A final thing I'll mention is that the `recur` operation is, I think, broken. Or
probably more accurately, the entire VM is broken in a way which prevents
`recur` from working correctly. It _does_ produce the correct output, so I
haven't prioritized debugging it, but for any large number of iterations it
takes a very long time to run.
[vmvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/vm.go#L18
[thunk]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/op.go#L11
[evaledge]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/scope.go#L29
## Demo
Finally, to show it off! I put together a super stupid `eval` binary which takes
two arguments: a graph to be used as an operation, and a value to be used as an
argument to that operation. It doesn't even read the code from a file, you have
to `cat` it in.
The [README][readme] documents how to run the demo, so if you'd like to do so
then please clone the repo and give it a shot! It should look like this when you
do:
```
# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8
21
```
You can put any number you like instead of `8`, but as mentioned, `recur` is
broken so it can take a while for larger numbers.
[readme]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/README.md
## Next Steps
The following are all the things I'd like to address the next time I work on
ginger:
* `gg`
* Allow for newlines (and `)` and `}`) to terminate statements, not just
`;`.
* Allow names to have punctuation characters in them (maybe?).
* Don't read all tokens into memory prior to parsing.
* `vm`
* Fix `recur`.
* Implement tail call optimization.
* General
* A bit of polish on the `eval` tool.
* Expose graph creation, traversal, and transformation functions as
builtins.
* Create plan (if not actually implement it yet) for how code will be
imported from one file to another. Namespacing in general will fall into
this bucket.
* Create plan (if not actually implement it yet) for how users can
extend/replace the lexer/parser.
I don't know _when_ I'll get to work on these next, ginger will come back up in
my rotation of projects eventually. It could be a few months. In the meantime I
hope you're as excited about this progress as I am, and if you have any feedback
I'd love to hear it.
Thanks for reading!