ginger syntax

This commit is contained in:
Brian Picciano 2021-08-27 17:51:55 -06:00
parent 38fdd7725d
commit a1f1044b48

View File

@ -0,0 +1,480 @@
---
title: >-
The Syntax of Ginger
description: >-
Oh man, this got real fun real quick.
series: ginger
tags: tech
---
Finally I have a syntax for ginger that I'm happy with. This has actually been a
huge roadblock for me up till this point. There's a bit of a chicken-and-the-egg
problem with the syntax: without pinning down the structures underlying the
syntax it's difficult to develop one, but without an idea of syntax it's
difficult to know what structures will be ergonomic to use.
I've been focusing on the structures so far, and have only now pinned down the
syntax. Let's see what it looks like.
## Preface: Conditionals
I've so far written [two][cond1] [posts][cond2] regarding conditionals in
ginger. After more reflection, I think I'm going to stick with my _original_
gut, which was to only have value and tuple vertices (no forks), and to use a
function which accepts both a boolean and two input edges: the first being the
one to take if the boolean is true, and the second being the one to take if it's
false.
Aka, the very first proposal in the [first post][cond1]. It's hard to justify
up-front, but I think once you see it in action with a clean syntax you'll agree
it just kind of works.
[cond1]: {% post_url 2021-03-01-conditionals-in-ginger %}
[cond2]: {% post_url 2021-03-04-conditionals-in-ginger-errata %}
## Designing a Syntax
Ginger is a bit of a strange language. It uses strange datastructures in strange
ways. But approaching the building of a syntax for any language is actually
straightforward: you're designing a serialization protocol.
To pull back a bit, consider a list of words. How would you encode this list in
order to write it to a file? To answer this, let's flip the question: how would
you design a sequence of characters (ie the contents of the file) such that the
reader could reconstruct the list?
Well, constructing the list from a sequence of characters requires being able to
construct it _at all_, so in what ways is the list constructed? For this list,
let's say there's only an append operation, which accepts a list and a value to
append to it, and returns the result.
If we say that append is encoded via wrapping parenthesis around its two
arguments, and that `()` encodes the empty list, then we get a syntax like...
```
(((() foo) bar) baz)
```
...which, in this instance, decodes to a list containing the words, "foo", "bar",
and "baz", in that order.
It's not a pretty syntax, but it demonstrates the method. If you know how the
datastructure is constructed via code, you know what capabilities the syntax must
have and how it needs to fit together.
## gg
All of this amounted to me needing to implement the ginger graph in some other
language, in order to see what features the syntax must have.
A few years ago I had begun an implementation of a graph datastructure in go, to
use as the base (or at least a reference) for ginger. I had called this
implementation `gg` (ginger graph), with the intention that this would also be
the file extension used to hold ginger code (how clever).
The basic qualities I wanted in a graph datastructure for ginger were, and still
are:
* Immutability, ie all operations which modify the structure should return a
copy, leaving the original intact.
* Support for tuples.
* The property that it should be impossible to construct an invalid graph. An
invalid graph might be, for example, one where there is a single node with no
edges.
* Well tested, and reasonably performant.
Coming back to all this after a few years I had expected to have a graph
datastructure implemented, possibly with immutability, but lacking in tuples and
tests. As it turns out I completely underestimated my past self, because as far
as I can tell I had already finished the damn thing, tuples, tests and all.
It looks like that's the point where I stopped, probably for being unsure about
some other aspect of the language, and my motivation fell off. The fact that
I've come back to ginger, after all these years, and essentially rederived the
same language all over again, gives me a lot of confidence that I'm on the right
track (and a lot of respect for my past self for having done all this work!)
The basic API I came up with for building ginger graphs (ggs) looks like this:
```go
package gg
// OpenEdge represents an edge with a source value but no destination value,
// with an optional value on it. On its own an OpenEdge has no meaning, but is
// used as a building block for making Graphs.
type OpenEdge struct{ ... }
// TupleOut constructs an OpenEdge leading from a tuple, which is comprised of
// the given OpenEdges leading into it, with an optional edge value.
func TupleOut(ins []OpenEdge, edgeVal Value) OpenEdge
// ValueOut constructs an OpenEdge leading from a non-tuple value, with an
// optional edge value.
func ValueOut(val, edgeVal Value) OpenEdge
// ZeroGraph is an empty Graph, from which all Graphs are constructed via calls
// to AddValueIn.
var ZeroGraph = &Graph{ ... }
// Graph is an immutable graph structure, formed from a collection of edges
// between values and tuples.
type Graph struct{ ... }
// AddValueIn returns a new Graph which is a copy of the original, with the
// addition of a new edge. The new edge's source and edge value come from the
// given OpenEdge, and the edge's destination value is the given value.
func (g *Graph) AddValueIn(oe OpenEdge, val Value) *Graph
```
The actual API is larger than this, and includes methods to remove edges,
iterate over edges and values, and perform unions and disjoins of ggs. But the
above are the elements which are required only for _making_ ggs, which is all
that a syntax is concerned with.
As a demonstration, here is how one would construct the `min` operation, which
takes two numbers and returns the smaller, using the `gg` package:
```go
// a, b, in, out, if, etc.. are Values which represent the respective symbol.
// a is the result of passing in to the 0 operation, ie a is the 0th element of
// the in tuple.
min := gg.ZeroGraph.AddValueIn(gg.ValueOut(in, 0), a)
// b is the 1st element of the in tuple
min = min.AddValueIn(gg.ValueOut(in, 1), b)
// out is the result of an if which compares a and b together, and which returns
// the lesser.
min = min.AddValueIn(out, gg.TupleOut([]gg.OpenEdge{
gg.TupleOut([]gg.OpenEdge{a, b}, lt),
a,
b,
}, if)
```
And here's a demonstration of how this `min` would be used:
```go
// out is the result of passing 1 and 5 to the min operation.
gg.ZeroGraph.AddValueIn(gg.TupleOut([]gg.OpenEdge{1, 5}, min), out)
```
## Make it Nice
_Technically_ we're done. We have an implementation of the language's underlying
structure, and a syntax which encodes it (ie the ugly ass go syntax above). But
obviously I'm not proposing anyone actually use that.
Another thing I found when digging around in the old ginger repo was a text
file, tucked away in a directory called "sandbox", which had a primitive syntax
which _almost_ worked. I won't copy it here, but you can find it if you care to.
But with that as a foundation I came up with a crude, rough draft spec, which
maps the go syntax to the new syntax.
```
ValueOut(val, edgeVal) : -edgeVal-val
ValueOut(val, null) : -val
TupleOut([]val, edgeVal) : -edgeVal-(val, ...)
TupleOut([]val, null) : -(val, ...)
Graph(openEdge->val, ...) : { val openEdge, ... }
```
A couple things to note about this spec:
* `null` is used to indicate absence of value on an edge. The details of `null`
are yet to be worked out, but we can use this placeholder for now.
* `Graph` is cheating a bit. In the original `gg` implementation a Graph gains
its OpenEdge/Value pairs via successive calls to `AddValueIn`. However, such a
pattern doesn't translate well to text, and since we're dealing purely with
constructing an entire Graph at once we can instead have our Graph syntax
declare all OpenEdge/Value pairs at once.
* It's backwards! Eg where the go syntax does `ValueOut(val, edgeVal)`, the
proposed spec puts the values in the opposite order: `-edgeVal-val`. The
former results in code which is read from input to output, while the latter
results in the opposite: output to input.
This was a tip I picked up from the old text file I found, and the result is
code which is more familiar to an existing programmer. I _think_ (but am
not sure) that it's also more in line with how programming is done mentally,
ie we start with a result and work backwards to figure out what it takes to
get there.
It's possible, though, that I'm wrong, so at this end of this post I'm going
to put some examples of the same code both "forwards" and "backwards" and see
how I feel about it.
With all that said, let's see it in action! Here's `min` implemented in our shiny new syntax:
```
min -{
a -0-in,
b -1-in,
out -if-(
-lt-(-a,-b),
-a,
-b
)
}
```
and then here's it being used:
```
out -min-(-1,-5)
```
## Make it _Nicer_
The most striking feature of this rough draft spec is all the prefix dashes,
such as in the `-min-(-1,-5)` statement. These dashes were included as they make
sense in the context of what the intended human interpretation of the structure
is: two values, `1`, and `5`, are being _piped into_ the two slots of a 2-tuple,
and that 2-tuple is being _piped into_ the `min` operation, the output of which
is being _piped into_ something `out`.
The "piping into" is what the dash represents, which is why the top level values
in the graph, `a`, `b`, and `out`, don't have a preceding dash; they are the
ultimate destinations of the pipes leading to them. But these pipes are
ultimately ugly, and also introduce odd questions like "how do we represent
-1?", so they need to go.
So I've made a second draft, which is only a few changes away from the rough,
but oh man do those changes make a world of difference. Here's the cleaned up
spec:
```
ValueOut(val, edgeVal) : edgeVal(val)
ValueOut(val, null) : val
TupleOut([]val, edgeVal) : edgeVal(val, ...)
TupleOut([]val, null) : (val, ...)
Graph(openEdge->val, ...) : { val = openEdge, ... }
```
The dashes were simply removed, and the `edgeVal` and `val` concatted together.
For `ValueOut(val, edgeVal)` wrapping parenthesis were put around `val`, to
delineate it and `edgeVal`. This conflicts with the syntax for `TupleOut([]val,
edgeVal)`, but that conflict is easy to remedy: when parenthesis wrap only a
single `val` then that is a `ValueOut`, otherwise it's a `TupleOut`.
Another change is to add an `=` between the `val` and `openEdge` in the `Graph`
constructor. This is a purely aesthetic change, but as you'll see it works well.
So let's see it! `min` implemented with this cleaned up syntax:
```
min = {
a = 0(in),
b = 1(in),
out = if(
lt(a,b),
a,
b
)
}
```
And then its use:
```
min(1,5)
```
Well well well, look what we have here: a conventional programming language
syntax! `{`/`}` wrap a scope, and `(`/`)` wrap function arguments and
(optionally) single values. It's a lot clearer now that `0` and `1` are being
used as operations themselves when instantiating `a` and `b`, and `if` is much
more readable.
I was extremely surprised at how well this actually worked out. Despite having
drastically different underpinnings than most languages it ends up looking both
familiar and obvious. How cool!
## Examples Examples Examples
Here's a collection of example programs written in this new syntax. The base
structure of these are borrowed from previous posts, I'm merely translating that
structure into a new form:
```
// decr outputs one less than the input.
decr = { out = add(in, -1) }
// fib accepts a number i, and outputs the ith fibonacci number.
fib = {
inner = {
n = 0(in),
a = 1(in),
b = 2(in),
out = if(zero?(n),
a,
inner(decr(n), b, add(a,b))
)
},
out = inner(in, 0, 1)
}
// map accepts a sequence and a function, and returns a sequence consisting of
// the result of applying the function to each of the elements in the given
// sequence.
map = {
inner = {
mapped-seq = 0(in),
orig-seq = 1(in),
op = 2(in),
i = len(mapped-seq),
// graphs provide an inherent laziness to the language. Just because
// next-el is _defined_ here doesn't mean it's evaluated here at runtime.
// In reality it will only be evaluated if/when evaluating out requires
// evaluating next-el.
next-el = op(i(orig-seq)),
next-mapped-seq = append(mapped-seq, next-el),
out = if(
eq(len(mapped-seq), len(orig-seq)),
mapped-seq,
inner(next-mapped-seq, orig-seq, op)
)
}
// zero-seq returns an empty sequence
out = inner(zero-seq(), 0(in), 1(in))
}
```
## Selpmexa Selpmexa Selpmexa
Our syntax encodes a graph, and a graph doesn't really care if the syntax was
encoded in an input-to-output vs an output-to-input direction. So, as promised,
here's all the above examples, but "backwards":
```
// min returns the lesser of the two numbers it is given
{
(in)0 = a,
(in)1 = b,
(
(a,b)lt,
a,
b
)if = out
} = min
// decr outputs one less than the input.
{ (in, -1)add = out } = decr
// fib accepts a number i, and outputs the ith fibonacci number.
{
{
(in)0 = n,
(in)1 = a,
(in)2 = b,
(
(n)zero?
a,
((n)decr, b, (a,b)add)inner
)if = out
} = inner,
(in, 0, 1)inner = out
} = fib
// map accepts a sequence and a function, and returns a sequence consisting of
// the result of applying the function to each of the elements in the given
// sequence.
{
{
(in)0 = mapped-seq,
(in)1 = orig-seq,
(in)2 = op,
(mapped-seq)len = i,
((orig-seq)i)op = next-el,
(mapped-seq, next-el)append = next-mapped-seq,
(
((mapped-seq)len, (orig-seq)len)eq,
mapped-seq,
(next-mapped-seq, orig-seq, op)inner
)if = out
} = inner,
(()zero-seq, (in)0, (in)1)inner = out
} = map
```
Do these make you itchy? They kind of make me itchy. But... parts of them also
appeal to me.
The obvious reason why these feel wrong to me is the placement of `if`:
```
(
(a,b)lt,
a,
b
)if = out
```
The tuple which is being passed to `if` here is confusing unless you already
know that it's going to be passed to `if`. But on your first readthrough you
won't know that till you get to the end, so you'll be in the dark until then.
For more complex programs I'm sure this problem will compound.
On the other hand, pretty much everything else looks _better_, imo. For example:
```
// copied and slightly modified from the original to make it even more complex
(mapped-seq, ((orig-seq)i)op)append = next-mapped-seq
```
Something like this reads very clearly to me, and requires a lot less mental
backtracking to comprehend. The main difficulty I have is tracking the
parenthesis, but the overall "flow" of data and the order of events is plain to
read.
## Next Steps
The syntax here is not done yet, not by a long shot. If my record with past
posts about ginger (wherein I've "decided" on something and then completely
backtracked in later posts every single time) is any indication then this syntax
won't even look remotely familiar in a very short while. But it's a great
starting point, I think, and raises a lot of good questions.
* Can I make parenthesis chains, a la the last example, more palatable in some
way?
* Should I go with the "backwards" syntax afterall? In a functional style of
programming `if` statements _should_ be in the minority, and so the syntax
which better represents the flow of data in that style might be the way.
* Destructuring of tuples seems to be wanted, as evidenced by all the `a =
0(in)` lines. Should this be reflected in the structure or solely be
syntactical sugar?
* Should the commas be replaced with any whitespace (and make commas count as
whitespace, as clojure has done)? If this is possible then I think they should
be, but I won't know for sure until I begin implementing the parser.
And, surely, many more! I've felt a bit lost with ginger for a _long_ time, but
seeing a real, usable syntax emerge has really invigorated me, and I'll be
tackling it again in earnest soon (fingers crossed).