ginger: it's alive
This commit is contained in:
parent
ed4d179680
commit
19f1efd748
422
static/src/_posts/2021-12-31-ginger-its-alive.md
Normal file
422
static/src/_posts/2021-12-31-ginger-its-alive.md
Normal file
@ -0,0 +1,422 @@
|
|||||||
|
---
|
||||||
|
title: >-
|
||||||
|
Ginger: It's Alive!
|
||||||
|
description: >-
|
||||||
|
The new best language for computing fibonacci numbers.
|
||||||
|
series: ginger
|
||||||
|
tags: tech
|
||||||
|
---
|
||||||
|
|
||||||
|
As a kind of Christmas present to myself I took a whole week off of work
|
||||||
|
specifically to dedicate myself to working on ginger.
|
||||||
|
|
||||||
|
My concrete goal was to be able to run a ginger program to compute any Nth
|
||||||
|
fibonacci number, a goal I chose because it would require the implementation of
|
||||||
|
conditionals, some kind of looping or recursion, and basic addition/subtraction.
|
||||||
|
In other words, it would require all the elements which comprise a Turing
|
||||||
|
complete language.
|
||||||
|
|
||||||
|
And you know what? I actually succeeded!
|
||||||
|
|
||||||
|
The implementation can be found [here][impl]. At this point ginger is an
|
||||||
|
interpreted language running in a golang-based VM. The dream is for it to be
|
||||||
|
self-hosted on LLVM (and other platforms after), but as an intermediate step to
|
||||||
|
that I decided on sticking to what I know (golang) rather than having to learn
|
||||||
|
two things at once.
|
||||||
|
|
||||||
|
In this post I'm going to describe the components of this VM at a high level,
|
||||||
|
show a quick demo of it working, and finally talk about the roadmap going
|
||||||
|
forward.
|
||||||
|
|
||||||
|
[impl]: https://github.com/mediocregopher/ginger/tree/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2
|
||||||
|
|
||||||
|
## Graph
|
||||||
|
|
||||||
|
The core package of the whole project is the [`graph`][graph] package. This
|
||||||
|
package implements a generic directed graph datastructure.
|
||||||
|
|
||||||
|
The generic part is worth noting; I was able to take advantage of go's new
|
||||||
|
generics which are currently [in beta][go118]. I'd read quite a bit on how the
|
||||||
|
generic system would work even before the beta was announced, so I was able to
|
||||||
|
hit the ground running and start using them without much issue.
|
||||||
|
|
||||||
|
Ginger's unique graph datastructure has been discussed in previous posts in this
|
||||||
|
series quite a bit, and this latest implementation doesn't deviate much at a
|
||||||
|
high level. Below are the most up-to-date core datatypes and functions which are
|
||||||
|
used to construct ginger graphs:
|
||||||
|
|
||||||
|
```go
|
||||||
|
|
||||||
|
// Value is any value which can be stored within a Graph. Values should be
|
||||||
|
// considered immutable, ie once used with the graph package their internal
|
||||||
|
// value does not change.
|
||||||
|
type Value interface {
|
||||||
|
Equal(Value) bool
|
||||||
|
String() string
|
||||||
|
}
|
||||||
|
|
||||||
|
// OpenEdge consists of the edge value (E) and source vertex value (V) of an
|
||||||
|
// edge in a Graph. When passed into the AddValueIn method a full edge is
|
||||||
|
// created. An OpenEdge can also be sourced from a tuple vertex, whose value is
|
||||||
|
// an ordered set of OpenEdges of this same type.
|
||||||
|
type OpenEdge[E, V Value] struct { ... }
|
||||||
|
|
||||||
|
// ValueOut creates a OpenEdge which, when used to construct a Graph, represents
|
||||||
|
// an edge (with edgeVal attached to it) coming from the vertex containing val.
|
||||||
|
func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V]
|
||||||
|
|
||||||
|
// TupleOut creates an OpenEdge which, when used to construct a Graph,
|
||||||
|
// represents an edge (with edgeVal attached to it) coming from the vertex
|
||||||
|
// comprised of the given ordered-set of input edges.
|
||||||
|
func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V]
|
||||||
|
|
||||||
|
// Graph is an immutable container of a set of vertices. The Graph keeps track
|
||||||
|
// of all Values which terminate an OpenEdge. E indicates the type of edge
|
||||||
|
// values, while V indicates the type of vertex values.
|
||||||
|
type Graph[E, V Value] struct { ... }
|
||||||
|
|
||||||
|
// AddValueIn takes a OpenEdge and connects it to the Value vertex containing
|
||||||
|
// val, returning the new Graph which reflects that connection.
|
||||||
|
func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V]
|
||||||
|
|
||||||
|
// ValueIns returns, if any, all OpenEdges which lead to the given Value in the
|
||||||
|
// Graph (ie, all those added via AddValueIn).
|
||||||
|
func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V]
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
The current `Graph` implementation is _incredibly_ inefficient, it does a lot of
|
||||||
|
copying, looping, and equality checks which could be optimized out one day.
|
||||||
|
That's going to be a recurring theme of this post, as I had to perform a
|
||||||
|
balancing act between actually reaching my goal for the week while not incurring
|
||||||
|
too much tech debt for myself.
|
||||||
|
|
||||||
|
[graph]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go
|
||||||
|
[go118]: https://go.dev/blog/go1.18beta1
|
||||||
|
|
||||||
|
### MapReduce
|
||||||
|
|
||||||
|
There's a final operation I implemented as part of the `graph` package:
|
||||||
|
[MapReduce][mapreduce]. It's a difficult operation to describe, but I'm going to
|
||||||
|
do my best in this section for those who are interested. If you don't understand
|
||||||
|
it, or don't care, just know that `MapReduce` is a generic tool for transforming
|
||||||
|
graphs.
|
||||||
|
|
||||||
|
For a description of `MapReduce` we need to present an example graph:
|
||||||
|
|
||||||
|
```
|
||||||
|
+<--b---
|
||||||
|
+ \
|
||||||
|
X <--a--+<--c----+<--f-- A
|
||||||
|
+ /
|
||||||
|
+ +<---g---
|
||||||
|
+<--d--+
|
||||||
|
+<---h---
|
||||||
|
\
|
||||||
|
Y <---------e----------- B
|
||||||
|
```
|
||||||
|
|
||||||
|
Plus signs indicate tuples, and lowercase letters are edge values while upper
|
||||||
|
case letters are vertex values. The pseudo-code to construct this graph in go
|
||||||
|
might look like:
|
||||||
|
|
||||||
|
```go
|
||||||
|
g := new(Graph)
|
||||||
|
|
||||||
|
fA := ValueOut("f", "A")
|
||||||
|
|
||||||
|
g = g.AddValueIn(
|
||||||
|
"X",
|
||||||
|
TupleOut(
|
||||||
|
"a",
|
||||||
|
TupleOut("b", fA),
|
||||||
|
TupleOut("c", fA),
|
||||||
|
TupleOut(
|
||||||
|
"d",
|
||||||
|
ValueOut("g", "A"),
|
||||||
|
ValueOut("h", "B"),
|
||||||
|
),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
g = g.AddValueIn("e", "B")
|
||||||
|
```
|
||||||
|
|
||||||
|
As can be seen in the [code][mapreduce], `MapReduce`'s first argument is an
|
||||||
|
`OpenEdge`, _not_ a `Graph`. Fundamentally `MapReduce` is a reduction of the
|
||||||
|
_dependencies_ of a particular value into a new value; to reduce the
|
||||||
|
dependencies of multiple values at the same time would be equivalent to looping
|
||||||
|
over those values and calling `MapReduce` on each individually. Having
|
||||||
|
`MapReduce` only deal with one edge at a time is more flexible.
|
||||||
|
|
||||||
|
So let's focus on a particular `OpenEdge`, the one leading into `X` (returned by
|
||||||
|
`TupleOut("a", etc...)`. `MapReduce` is going to descend into this `OpenEdge`
|
||||||
|
recursively, in order to first find all value vertices (ie the leaf vertices,
|
||||||
|
those without any children of their own).
|
||||||
|
|
||||||
|
At this point `MapReduce` will use its second argument, the `mapVal` function,
|
||||||
|
which accepts a value of one type and returns a value of another type. This
|
||||||
|
function is called on each value from every value vertex encountered. In this
|
||||||
|
case both `A` and `B` are connectable from `X`, so `mapVal` will be called on
|
||||||
|
each _only once_. This is the case even though `A` is connected to multiple
|
||||||
|
times (once with an edge value of `f`, another with an edge value of `b`).
|
||||||
|
`mapVal` only gets called once per vertex, not per connection.
|
||||||
|
|
||||||
|
With all values mapped, `MapReduce` will begin reducing. For each edge leaving
|
||||||
|
each value vertex, the `reduceEdge` function is called. `reduceEdge` accepts as
|
||||||
|
arguments the edge value of the edge and the _mapped value_ (not the original
|
||||||
|
value) of the vertex, and returns a new value of the same type that `mapVal`
|
||||||
|
returned. Like `mapVal`, `reduceEdge` will only be called once per edge. In our
|
||||||
|
example, `<--f--A` is used twice (`b` and `c`), but `reduceEdge` will only be
|
||||||
|
called on it once.
|
||||||
|
|
||||||
|
With each value vertex edge having been reduced, `reduceEdge` is called again on
|
||||||
|
each edge leaving _those_ edges, which must be tuple edges. An array of the
|
||||||
|
values returned from the previous `reduceEdge` calls for each of the tuples'
|
||||||
|
input edges is used as the value argument in the next call. This is done until
|
||||||
|
the `OpenEdge` is fully reduced into a single value.
|
||||||
|
|
||||||
|
To flesh out our example, let's imagine a `mapVal` which returns the input
|
||||||
|
string repeated twice, and a `reduceEdge` which returns the input values joined
|
||||||
|
with the edge value, and then wrapped with the edge value (eg `reduceEdge(a, [B,
|
||||||
|
C]) -> aBaCa`).
|
||||||
|
|
||||||
|
Calling `MapReduce` on the edge leading into `X` will then give us the following
|
||||||
|
calls:
|
||||||
|
|
||||||
|
```
|
||||||
|
# Map the value vertices
|
||||||
|
|
||||||
|
mapVal(A) -> AA
|
||||||
|
mapVal(B) -> BB
|
||||||
|
|
||||||
|
# Reduce the value vertex edges
|
||||||
|
|
||||||
|
reduceEdge(f, [AA]) -> fAAf
|
||||||
|
reduceEdge(g, [AA]) -> gAAg
|
||||||
|
reduceEdge(h, [BB]) -> hBBh
|
||||||
|
|
||||||
|
# Reduce tuple vertex edges
|
||||||
|
|
||||||
|
reduceEdge(b, [fAAf]) -> bfAAfb
|
||||||
|
reduceEdge(c, [fAAf]) -> cfAAfc
|
||||||
|
reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd
|
||||||
|
|
||||||
|
reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda
|
||||||
|
```
|
||||||
|
|
||||||
|
Beautiful, exactly what we wanted.
|
||||||
|
|
||||||
|
`MapReduce` will prove extremely useful when it comes time for the VM to execute
|
||||||
|
the graph. It enables the VM to evaluate only the values which are needed to
|
||||||
|
produce an output, and to only evaluate each value once no matter how many times
|
||||||
|
it's used. `MapReduce` also takes care of the recursive traversal of the
|
||||||
|
`Graph`, which simplifies the VM code significantly.
|
||||||
|
|
||||||
|
[mapreduce]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go#L338
|
||||||
|
|
||||||
|
## gg
|
||||||
|
|
||||||
|
With a generic graph implementation out of the way, it was then required to
|
||||||
|
define a specific implementation which could be parsed from a file and later
|
||||||
|
used for execution in the VM.
|
||||||
|
|
||||||
|
The file extension used for ginger code is `.gg`, as in "ginger graph" (of
|
||||||
|
course). The package name for decoding this file format is, therefore, also
|
||||||
|
called `gg`.
|
||||||
|
|
||||||
|
The core datatype for the `gg` package is the [`Value`][ggvalue], since the
|
||||||
|
`graph` package takes care of essentially everything else in the realm of graph
|
||||||
|
construction and manipulation. The type definition is:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Value represents a value which can be serialized by the gg text format.
|
||||||
|
type Value struct {
|
||||||
|
|
||||||
|
// Only one of these fields may be set
|
||||||
|
Name *string
|
||||||
|
Number *int64
|
||||||
|
Graph *Graph
|
||||||
|
|
||||||
|
// Optional fields indicating the token which was used to construct this
|
||||||
|
// Value, if any.
|
||||||
|
LexerToken *LexerToken
|
||||||
|
}
|
||||||
|
|
||||||
|
type Graph = graph.Graph[Value, Value] // type alias for convenience
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that it's currently only possible to describe three different types in a
|
||||||
|
`gg` file, and one of them is the `Graph`! These are the only ones needed to
|
||||||
|
implement a fibonacci function, so they're all I implemented.
|
||||||
|
|
||||||
|
The lexing/parsing of `gg` files is not super interesting, you can check out the
|
||||||
|
package code for more details. The only other thing worth noting is that, for
|
||||||
|
now, all statements are required to end with a `;`. I had originally wanted to
|
||||||
|
be less strict with this, and allow newlines and other tokens to indicate the
|
||||||
|
end of statements, but it was complicating the code and I wanted to move on.
|
||||||
|
|
||||||
|
Another small thing worth noting is that I decided to make each entire `.gg`
|
||||||
|
file implicitly define a graph. So you can imagine each file's contents wrapped
|
||||||
|
in curly braces.
|
||||||
|
|
||||||
|
With the `gg` package out of the way I was able to finally parse ginger
|
||||||
|
programs! The following is the actual, real-life implementation of the fibonacci
|
||||||
|
function (though at this point it didn't actually work, because the VM was still
|
||||||
|
not implemented:
|
||||||
|
|
||||||
|
```
|
||||||
|
out = {
|
||||||
|
|
||||||
|
decr = { out = add < (in; -1;); };
|
||||||
|
|
||||||
|
n = tupEl < (in; 0;);
|
||||||
|
a = tupEl < (in; 1;);
|
||||||
|
b = tupEl < (in; 2;);
|
||||||
|
|
||||||
|
out = if < (
|
||||||
|
isZero < n;
|
||||||
|
a;
|
||||||
|
recur < (
|
||||||
|
decr < n;
|
||||||
|
b;
|
||||||
|
add < (a;b;);
|
||||||
|
);
|
||||||
|
);
|
||||||
|
|
||||||
|
} < (in; 0; 1;);
|
||||||
|
```
|
||||||
|
|
||||||
|
[ggvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/gg/gg.go#L14
|
||||||
|
|
||||||
|
## VM
|
||||||
|
|
||||||
|
Finally, the meat of all this. If the `graph` and `gg` packages are the sturdy,
|
||||||
|
well constructed foundations of a tall building, then the `vm` package is the
|
||||||
|
extremely long, flimsy stick someone propped up vertically so they could say
|
||||||
|
they built a structure of impressive height.
|
||||||
|
|
||||||
|
In other words, it's very likely that the current iteration of the VM will not
|
||||||
|
be long for this world, and so I won't waste time describing it in super detail.
|
||||||
|
|
||||||
|
What I will say about it is that within the `vm` package I've defined a [new
|
||||||
|
`Value` type][vmvalue], which extends the one defined in `gg`. The necessity of
|
||||||
|
this was that there are types which cannot be represented syntactically in a
|
||||||
|
`.gg` file, but which _can_ be used as values within a program being run.
|
||||||
|
|
||||||
|
The first of these is the `Operation`, which is essentially a first-class
|
||||||
|
function. The VM will automatically interpret a graph as an `Operation` when it
|
||||||
|
is used as an edge value, as has been discussed in previous posts, but there are
|
||||||
|
also built-in operations (like `if` and `recur`) which cannot be represented as
|
||||||
|
datastructures, and so it was necessary to introduce a new in-memory type to
|
||||||
|
properly represent operations.
|
||||||
|
|
||||||
|
The second is the `Tuple` type. This may seem strange, as ginger graphs already
|
||||||
|
have a concept of a tuple. But the ginger graph tuple is a _vertex type_, not a
|
||||||
|
value type. The distinction is small, but important. Essentially the graph tuple
|
||||||
|
is a structural element which describes how to create a tuple value, but it is
|
||||||
|
not yet that value. So we need a new Value type to hold the tuple once it _has_
|
||||||
|
been created during runtime.
|
||||||
|
|
||||||
|
Another thing worth describing about the `vm` package, even though I think they
|
||||||
|
might change drastically, are [`Thunk`s][thunk]:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Thunk is returned from the performance of an Operation. When called it will
|
||||||
|
// return the result of that Operation having been called with the particular
|
||||||
|
// arguments which were passed in.
|
||||||
|
type Thunk func() (Value, error)
|
||||||
|
```
|
||||||
|
|
||||||
|
The term "thunk" is borrowed from Haskell, which I don't actually know so I'm
|
||||||
|
probably using it wrong, but anyway...
|
||||||
|
|
||||||
|
A thunk is essentially a value which has yet to be evaluated; the VM knows
|
||||||
|
exactly _how_ to evaluate it, but it hasn't done so yet. The primary reason for
|
||||||
|
their existence within ginger is to account for conditionals, ie the `if`
|
||||||
|
operation. The VM can't evaluate each of an `if`'s arguments all at once, it
|
||||||
|
must only evaluate the first argument (to obtain a boolean), and then based on
|
||||||
|
that evaluate the second or third argument.
|
||||||
|
|
||||||
|
This is where `graph.MapReduce` comes in. The VM uses `graph.MapReduce` to
|
||||||
|
reduce each edge in a graph to a `Thunk`, where the `Thunk`'s value is based on
|
||||||
|
the operation (the edge's value) and the inputs to the edge (which will
|
||||||
|
themselves be `Thunk`s). Because each `Thunk` represents a potential value, not
|
||||||
|
an actual one, the VM is able to completely parse the program to be executed
|
||||||
|
(using `graph.MapReduce`) while allowing conditionals to still work correctly.
|
||||||
|
|
||||||
|
[EvaluateEdge][evaledge] is where all that happens, if you're interested, but be
|
||||||
|
warned that the code is a hot mess right now and it's probably not worth
|
||||||
|
spending a ton of time understanding it as it will change a lot.
|
||||||
|
|
||||||
|
A final thing I'll mention is that the `recur` operation is, I think, broken. Or
|
||||||
|
probably more accurately, the entire VM is broken in a way which prevents
|
||||||
|
`recur` from working correctly. It _does_ produce the correct output, so I
|
||||||
|
haven't prioritized debugging it, but for any large number of iterations it
|
||||||
|
takes a very long time to run.
|
||||||
|
|
||||||
|
[vmvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/vm.go#L18
|
||||||
|
[thunk]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/op.go#L11
|
||||||
|
[evaledge]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/scope.go#L29
|
||||||
|
|
||||||
|
## Demo
|
||||||
|
|
||||||
|
Finally, to show it off! I put together a super stupid `eval` binary which takes
|
||||||
|
two arguments: a graph to be used as an operation, and a value to be used as an
|
||||||
|
argument to that operation. It doesn't even read the code from a file, you have
|
||||||
|
to `cat` it in.
|
||||||
|
|
||||||
|
The [README][readme] documents how to run the demo, so if you'd like to do so
|
||||||
|
then please clone the repo and give it a shot! It should look like this when you
|
||||||
|
do:
|
||||||
|
|
||||||
|
```
|
||||||
|
# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8
|
||||||
|
21
|
||||||
|
```
|
||||||
|
|
||||||
|
You can put any number you like instead of `8`, but as mentioned, `recur` is
|
||||||
|
broken so it can take a while for larger numbers.
|
||||||
|
|
||||||
|
[readme]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/README.md
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
The following are all the things I'd like to address the next time I work on
|
||||||
|
ginger:
|
||||||
|
|
||||||
|
* `gg`
|
||||||
|
|
||||||
|
* Allow for newlines (and `)` and `}`) to terminate statements, not just
|
||||||
|
`;`.
|
||||||
|
|
||||||
|
* Allow names to have punctuation characters in them (maybe?).
|
||||||
|
|
||||||
|
* Don't read all tokens into memory prior to parsing.
|
||||||
|
|
||||||
|
* `vm`
|
||||||
|
|
||||||
|
* Fix `recur`.
|
||||||
|
|
||||||
|
* Implement tail call optimization.
|
||||||
|
|
||||||
|
* General
|
||||||
|
|
||||||
|
* A bit of polish on the `eval` tool.
|
||||||
|
|
||||||
|
* Expose graph creation, traversal, and transformation functions as
|
||||||
|
builtins.
|
||||||
|
|
||||||
|
* Create plan (if not actually implement it yet) for how code will be
|
||||||
|
imported from one file to another. Namespacing in general will fall into
|
||||||
|
this bucket.
|
||||||
|
|
||||||
|
* Create plan (if not actually implement it yet) for how users can
|
||||||
|
extend/replace the lexer/parser.
|
||||||
|
|
||||||
|
I don't know _when_ I'll get to work on these next, ginger will come back up in
|
||||||
|
my rotation of projects eventually. It could be a few months. In the meantime I
|
||||||
|
hope you're as excited about this progress as I am, and if you have any feedback
|
||||||
|
I'd love to hear it.
|
||||||
|
|
||||||
|
Thanks for reading!
|
Loading…
Reference in New Issue
Block a user