ginger: it's alive
This commit is contained in:
parent
ed4d179680
commit
19f1efd748
422
static/src/_posts/2021-12-31-ginger-its-alive.md
Normal file
422
static/src/_posts/2021-12-31-ginger-its-alive.md
Normal file
@ -0,0 +1,422 @@
|
||||
---
|
||||
title: >-
|
||||
Ginger: It's Alive!
|
||||
description: >-
|
||||
The new best language for computing fibonacci numbers.
|
||||
series: ginger
|
||||
tags: tech
|
||||
---
|
||||
|
||||
As a kind of Christmas present to myself I took a whole week off of work
|
||||
specifically to dedicate myself to working on ginger.
|
||||
|
||||
My concrete goal was to be able to run a ginger program to compute any Nth
|
||||
fibonacci number, a goal I chose because it would require the implementation of
|
||||
conditionals, some kind of looping or recursion, and basic addition/subtraction.
|
||||
In other words, it would require all the elements which comprise a Turing
|
||||
complete language.
|
||||
|
||||
And you know what? I actually succeeded!
|
||||
|
||||
The implementation can be found [here][impl]. At this point ginger is an
|
||||
interpreted language running in a golang-based VM. The dream is for it to be
|
||||
self-hosted on LLVM (and other platforms after), but as an intermediate step to
|
||||
that I decided on sticking to what I know (golang) rather than having to learn
|
||||
two things at once.
|
||||
|
||||
In this post I'm going to describe the components of this VM at a high level,
|
||||
show a quick demo of it working, and finally talk about the roadmap going
|
||||
forward.
|
||||
|
||||
[impl]: https://github.com/mediocregopher/ginger/tree/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2
|
||||
|
||||
## Graph
|
||||
|
||||
The core package of the whole project is the [`graph`][graph] package. This
|
||||
package implements a generic directed graph datastructure.
|
||||
|
||||
The generic part is worth noting; I was able to take advantage of go's new
|
||||
generics which are currently [in beta][go118]. I'd read quite a bit on how the
|
||||
generic system would work even before the beta was announced, so I was able to
|
||||
hit the ground running and start using them without much issue.
|
||||
|
||||
Ginger's unique graph datastructure has been discussed in previous posts in this
|
||||
series quite a bit, and this latest implementation doesn't deviate much at a
|
||||
high level. Below are the most up-to-date core datatypes and functions which are
|
||||
used to construct ginger graphs:
|
||||
|
||||
```go
|
||||
|
||||
// Value is any value which can be stored within a Graph. Values should be
|
||||
// considered immutable, ie once used with the graph package their internal
|
||||
// value does not change.
|
||||
type Value interface {
|
||||
Equal(Value) bool
|
||||
String() string
|
||||
}
|
||||
|
||||
// OpenEdge consists of the edge value (E) and source vertex value (V) of an
|
||||
// edge in a Graph. When passed into the AddValueIn method a full edge is
|
||||
// created. An OpenEdge can also be sourced from a tuple vertex, whose value is
|
||||
// an ordered set of OpenEdges of this same type.
|
||||
type OpenEdge[E, V Value] struct { ... }
|
||||
|
||||
// ValueOut creates a OpenEdge which, when used to construct a Graph, represents
|
||||
// an edge (with edgeVal attached to it) coming from the vertex containing val.
|
||||
func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V]
|
||||
|
||||
// TupleOut creates an OpenEdge which, when used to construct a Graph,
|
||||
// represents an edge (with edgeVal attached to it) coming from the vertex
|
||||
// comprised of the given ordered-set of input edges.
|
||||
func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V]
|
||||
|
||||
// Graph is an immutable container of a set of vertices. The Graph keeps track
|
||||
// of all Values which terminate an OpenEdge. E indicates the type of edge
|
||||
// values, while V indicates the type of vertex values.
|
||||
type Graph[E, V Value] struct { ... }
|
||||
|
||||
// AddValueIn takes a OpenEdge and connects it to the Value vertex containing
|
||||
// val, returning the new Graph which reflects that connection.
|
||||
func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V]
|
||||
|
||||
// ValueIns returns, if any, all OpenEdges which lead to the given Value in the
|
||||
// Graph (ie, all those added via AddValueIn).
|
||||
func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V]
|
||||
|
||||
```
|
||||
|
||||
The current `Graph` implementation is _incredibly_ inefficient, it does a lot of
|
||||
copying, looping, and equality checks which could be optimized out one day.
|
||||
That's going to be a recurring theme of this post, as I had to perform a
|
||||
balancing act between actually reaching my goal for the week while not incurring
|
||||
too much tech debt for myself.
|
||||
|
||||
[graph]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go
|
||||
[go118]: https://go.dev/blog/go1.18beta1
|
||||
|
||||
### MapReduce
|
||||
|
||||
There's a final operation I implemented as part of the `graph` package:
|
||||
[MapReduce][mapreduce]. It's a difficult operation to describe, but I'm going to
|
||||
do my best in this section for those who are interested. If you don't understand
|
||||
it, or don't care, just know that `MapReduce` is a generic tool for transforming
|
||||
graphs.
|
||||
|
||||
For a description of `MapReduce` we need to present an example graph:
|
||||
|
||||
```
|
||||
+<--b---
|
||||
+ \
|
||||
X <--a--+<--c----+<--f-- A
|
||||
+ /
|
||||
+ +<---g---
|
||||
+<--d--+
|
||||
+<---h---
|
||||
\
|
||||
Y <---------e----------- B
|
||||
```
|
||||
|
||||
Plus signs indicate tuples, and lowercase letters are edge values while upper
|
||||
case letters are vertex values. The pseudo-code to construct this graph in go
|
||||
might look like:
|
||||
|
||||
```go
|
||||
g := new(Graph)
|
||||
|
||||
fA := ValueOut("f", "A")
|
||||
|
||||
g = g.AddValueIn(
|
||||
"X",
|
||||
TupleOut(
|
||||
"a",
|
||||
TupleOut("b", fA),
|
||||
TupleOut("c", fA),
|
||||
TupleOut(
|
||||
"d",
|
||||
ValueOut("g", "A"),
|
||||
ValueOut("h", "B"),
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
g = g.AddValueIn("e", "B")
|
||||
```
|
||||
|
||||
As can be seen in the [code][mapreduce], `MapReduce`'s first argument is an
|
||||
`OpenEdge`, _not_ a `Graph`. Fundamentally `MapReduce` is a reduction of the
|
||||
_dependencies_ of a particular value into a new value; to reduce the
|
||||
dependencies of multiple values at the same time would be equivalent to looping
|
||||
over those values and calling `MapReduce` on each individually. Having
|
||||
`MapReduce` only deal with one edge at a time is more flexible.
|
||||
|
||||
So let's focus on a particular `OpenEdge`, the one leading into `X` (returned by
|
||||
`TupleOut("a", etc...)`. `MapReduce` is going to descend into this `OpenEdge`
|
||||
recursively, in order to first find all value vertices (ie the leaf vertices,
|
||||
those without any children of their own).
|
||||
|
||||
At this point `MapReduce` will use its second argument, the `mapVal` function,
|
||||
which accepts a value of one type and returns a value of another type. This
|
||||
function is called on each value from every value vertex encountered. In this
|
||||
case both `A` and `B` are connectable from `X`, so `mapVal` will be called on
|
||||
each _only once_. This is the case even though `A` is connected to multiple
|
||||
times (once with an edge value of `f`, another with an edge value of `b`).
|
||||
`mapVal` only gets called once per vertex, not per connection.
|
||||
|
||||
With all values mapped, `MapReduce` will begin reducing. For each edge leaving
|
||||
each value vertex, the `reduceEdge` function is called. `reduceEdge` accepts as
|
||||
arguments the edge value of the edge and the _mapped value_ (not the original
|
||||
value) of the vertex, and returns a new value of the same type that `mapVal`
|
||||
returned. Like `mapVal`, `reduceEdge` will only be called once per edge. In our
|
||||
example, `<--f--A` is used twice (`b` and `c`), but `reduceEdge` will only be
|
||||
called on it once.
|
||||
|
||||
With each value vertex edge having been reduced, `reduceEdge` is called again on
|
||||
each edge leaving _those_ edges, which must be tuple edges. An array of the
|
||||
values returned from the previous `reduceEdge` calls for each of the tuples'
|
||||
input edges is used as the value argument in the next call. This is done until
|
||||
the `OpenEdge` is fully reduced into a single value.
|
||||
|
||||
To flesh out our example, let's imagine a `mapVal` which returns the input
|
||||
string repeated twice, and a `reduceEdge` which returns the input values joined
|
||||
with the edge value, and then wrapped with the edge value (eg `reduceEdge(a, [B,
|
||||
C]) -> aBaCa`).
|
||||
|
||||
Calling `MapReduce` on the edge leading into `X` will then give us the following
|
||||
calls:
|
||||
|
||||
```
|
||||
# Map the value vertices
|
||||
|
||||
mapVal(A) -> AA
|
||||
mapVal(B) -> BB
|
||||
|
||||
# Reduce the value vertex edges
|
||||
|
||||
reduceEdge(f, [AA]) -> fAAf
|
||||
reduceEdge(g, [AA]) -> gAAg
|
||||
reduceEdge(h, [BB]) -> hBBh
|
||||
|
||||
# Reduce tuple vertex edges
|
||||
|
||||
reduceEdge(b, [fAAf]) -> bfAAfb
|
||||
reduceEdge(c, [fAAf]) -> cfAAfc
|
||||
reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd
|
||||
|
||||
reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda
|
||||
```
|
||||
|
||||
Beautiful, exactly what we wanted.
|
||||
|
||||
`MapReduce` will prove extremely useful when it comes time for the VM to execute
|
||||
the graph. It enables the VM to evaluate only the values which are needed to
|
||||
produce an output, and to only evaluate each value once no matter how many times
|
||||
it's used. `MapReduce` also takes care of the recursive traversal of the
|
||||
`Graph`, which simplifies the VM code significantly.
|
||||
|
||||
[mapreduce]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/graph/graph.go#L338
|
||||
|
||||
## gg
|
||||
|
||||
With a generic graph implementation out of the way, it was then required to
|
||||
define a specific implementation which could be parsed from a file and later
|
||||
used for execution in the VM.
|
||||
|
||||
The file extension used for ginger code is `.gg`, as in "ginger graph" (of
|
||||
course). The package name for decoding this file format is, therefore, also
|
||||
called `gg`.
|
||||
|
||||
The core datatype for the `gg` package is the [`Value`][ggvalue], since the
|
||||
`graph` package takes care of essentially everything else in the realm of graph
|
||||
construction and manipulation. The type definition is:
|
||||
|
||||
```go
|
||||
// Value represents a value which can be serialized by the gg text format.
|
||||
type Value struct {
|
||||
|
||||
// Only one of these fields may be set
|
||||
Name *string
|
||||
Number *int64
|
||||
Graph *Graph
|
||||
|
||||
// Optional fields indicating the token which was used to construct this
|
||||
// Value, if any.
|
||||
LexerToken *LexerToken
|
||||
}
|
||||
|
||||
type Graph = graph.Graph[Value, Value] // type alias for convenience
|
||||
```
|
||||
|
||||
Note that it's currently only possible to describe three different types in a
|
||||
`gg` file, and one of them is the `Graph`! These are the only ones needed to
|
||||
implement a fibonacci function, so they're all I implemented.
|
||||
|
||||
The lexing/parsing of `gg` files is not super interesting, you can check out the
|
||||
package code for more details. The only other thing worth noting is that, for
|
||||
now, all statements are required to end with a `;`. I had originally wanted to
|
||||
be less strict with this, and allow newlines and other tokens to indicate the
|
||||
end of statements, but it was complicating the code and I wanted to move on.
|
||||
|
||||
Another small thing worth noting is that I decided to make each entire `.gg`
|
||||
file implicitly define a graph. So you can imagine each file's contents wrapped
|
||||
in curly braces.
|
||||
|
||||
With the `gg` package out of the way I was able to finally parse ginger
|
||||
programs! The following is the actual, real-life implementation of the fibonacci
|
||||
function (though at this point it didn't actually work, because the VM was still
|
||||
not implemented:
|
||||
|
||||
```
|
||||
out = {
|
||||
|
||||
decr = { out = add < (in; -1;); };
|
||||
|
||||
n = tupEl < (in; 0;);
|
||||
a = tupEl < (in; 1;);
|
||||
b = tupEl < (in; 2;);
|
||||
|
||||
out = if < (
|
||||
isZero < n;
|
||||
a;
|
||||
recur < (
|
||||
decr < n;
|
||||
b;
|
||||
add < (a;b;);
|
||||
);
|
||||
);
|
||||
|
||||
} < (in; 0; 1;);
|
||||
```
|
||||
|
||||
[ggvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/gg/gg.go#L14
|
||||
|
||||
## VM
|
||||
|
||||
Finally, the meat of all this. If the `graph` and `gg` packages are the sturdy,
|
||||
well constructed foundations of a tall building, then the `vm` package is the
|
||||
extremely long, flimsy stick someone propped up vertically so they could say
|
||||
they built a structure of impressive height.
|
||||
|
||||
In other words, it's very likely that the current iteration of the VM will not
|
||||
be long for this world, and so I won't waste time describing it in super detail.
|
||||
|
||||
What I will say about it is that within the `vm` package I've defined a [new
|
||||
`Value` type][vmvalue], which extends the one defined in `gg`. The necessity of
|
||||
this was that there are types which cannot be represented syntactically in a
|
||||
`.gg` file, but which _can_ be used as values within a program being run.
|
||||
|
||||
The first of these is the `Operation`, which is essentially a first-class
|
||||
function. The VM will automatically interpret a graph as an `Operation` when it
|
||||
is used as an edge value, as has been discussed in previous posts, but there are
|
||||
also built-in operations (like `if` and `recur`) which cannot be represented as
|
||||
datastructures, and so it was necessary to introduce a new in-memory type to
|
||||
properly represent operations.
|
||||
|
||||
The second is the `Tuple` type. This may seem strange, as ginger graphs already
|
||||
have a concept of a tuple. But the ginger graph tuple is a _vertex type_, not a
|
||||
value type. The distinction is small, but important. Essentially the graph tuple
|
||||
is a structural element which describes how to create a tuple value, but it is
|
||||
not yet that value. So we need a new Value type to hold the tuple once it _has_
|
||||
been created during runtime.
|
||||
|
||||
Another thing worth describing about the `vm` package, even though I think they
|
||||
might change drastically, are [`Thunk`s][thunk]:
|
||||
|
||||
```go
|
||||
// Thunk is returned from the performance of an Operation. When called it will
|
||||
// return the result of that Operation having been called with the particular
|
||||
// arguments which were passed in.
|
||||
type Thunk func() (Value, error)
|
||||
```
|
||||
|
||||
The term "thunk" is borrowed from Haskell, which I don't actually know so I'm
|
||||
probably using it wrong, but anyway...
|
||||
|
||||
A thunk is essentially a value which has yet to be evaluated; the VM knows
|
||||
exactly _how_ to evaluate it, but it hasn't done so yet. The primary reason for
|
||||
their existence within ginger is to account for conditionals, ie the `if`
|
||||
operation. The VM can't evaluate each of an `if`'s arguments all at once, it
|
||||
must only evaluate the first argument (to obtain a boolean), and then based on
|
||||
that evaluate the second or third argument.
|
||||
|
||||
This is where `graph.MapReduce` comes in. The VM uses `graph.MapReduce` to
|
||||
reduce each edge in a graph to a `Thunk`, where the `Thunk`'s value is based on
|
||||
the operation (the edge's value) and the inputs to the edge (which will
|
||||
themselves be `Thunk`s). Because each `Thunk` represents a potential value, not
|
||||
an actual one, the VM is able to completely parse the program to be executed
|
||||
(using `graph.MapReduce`) while allowing conditionals to still work correctly.
|
||||
|
||||
[EvaluateEdge][evaledge] is where all that happens, if you're interested, but be
|
||||
warned that the code is a hot mess right now and it's probably not worth
|
||||
spending a ton of time understanding it as it will change a lot.
|
||||
|
||||
A final thing I'll mention is that the `recur` operation is, I think, broken. Or
|
||||
probably more accurately, the entire VM is broken in a way which prevents
|
||||
`recur` from working correctly. It _does_ produce the correct output, so I
|
||||
haven't prioritized debugging it, but for any large number of iterations it
|
||||
takes a very long time to run.
|
||||
|
||||
[vmvalue]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/vm.go#L18
|
||||
[thunk]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/op.go#L11
|
||||
[evaledge]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/vm/scope.go#L29
|
||||
|
||||
## Demo
|
||||
|
||||
Finally, to show it off! I put together a super stupid `eval` binary which takes
|
||||
two arguments: a graph to be used as an operation, and a value to be used as an
|
||||
argument to that operation. It doesn't even read the code from a file, you have
|
||||
to `cat` it in.
|
||||
|
||||
The [README][readme] documents how to run the demo, so if you'd like to do so
|
||||
then please clone the repo and give it a shot! It should look like this when you
|
||||
do:
|
||||
|
||||
```
|
||||
# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8
|
||||
21
|
||||
```
|
||||
|
||||
You can put any number you like instead of `8`, but as mentioned, `recur` is
|
||||
broken so it can take a while for larger numbers.
|
||||
|
||||
[readme]: https://github.com/mediocregopher/ginger/blob/ebf57591a8ac08da8a312855fc3a6d9c1ee6dcb2/README.md
|
||||
|
||||
## Next Steps
|
||||
|
||||
The following are all the things I'd like to address the next time I work on
|
||||
ginger:
|
||||
|
||||
* `gg`
|
||||
|
||||
* Allow for newlines (and `)` and `}`) to terminate statements, not just
|
||||
`;`.
|
||||
|
||||
* Allow names to have punctuation characters in them (maybe?).
|
||||
|
||||
* Don't read all tokens into memory prior to parsing.
|
||||
|
||||
* `vm`
|
||||
|
||||
* Fix `recur`.
|
||||
|
||||
* Implement tail call optimization.
|
||||
|
||||
* General
|
||||
|
||||
* A bit of polish on the `eval` tool.
|
||||
|
||||
* Expose graph creation, traversal, and transformation functions as
|
||||
builtins.
|
||||
|
||||
* Create plan (if not actually implement it yet) for how code will be
|
||||
imported from one file to another. Namespacing in general will fall into
|
||||
this bucket.
|
||||
|
||||
* Create plan (if not actually implement it yet) for how users can
|
||||
extend/replace the lexer/parser.
|
||||
|
||||
I don't know _when_ I'll get to work on these next, ginger will come back up in
|
||||
my rotation of projects eventually. It could be a few months. In the meantime I
|
||||
hope you're as excited about this progress as I am, and if you have any feedback
|
||||
I'd love to hear it.
|
||||
|
||||
Thanks for reading!
|
Loading…
Reference in New Issue
Block a user