From a1f1044b488ddb0f5eda1ab5d5e1c481e80be996 Mon Sep 17 00:00:00 2001 From: Brian Picciano Date: Fri, 27 Aug 2021 17:51:55 -0600 Subject: [PATCH] ginger syntax --- .../_posts/2021-08-26-the-syntax-of-ginger.md | 480 ++++++++++++++++++ 1 file changed, 480 insertions(+) create mode 100644 static/src/_posts/2021-08-26-the-syntax-of-ginger.md diff --git a/static/src/_posts/2021-08-26-the-syntax-of-ginger.md b/static/src/_posts/2021-08-26-the-syntax-of-ginger.md new file mode 100644 index 0000000..4ff64d8 --- /dev/null +++ b/static/src/_posts/2021-08-26-the-syntax-of-ginger.md @@ -0,0 +1,480 @@ +--- +title: >- + The Syntax of Ginger +description: >- + Oh man, this got real fun real quick. +series: ginger +tags: tech +--- + +Finally I have a syntax for ginger that I'm happy with. This has actually been a +huge roadblock for me up till this point. There's a bit of a chicken-and-the-egg +problem with the syntax: without pinning down the structures underlying the +syntax it's difficult to develop one, but without an idea of syntax it's +difficult to know what structures will be ergonomic to use. + +I've been focusing on the structures so far, and have only now pinned down the +syntax. Let's see what it looks like. + +## Preface: Conditionals + +I've so far written [two][cond1] [posts][cond2] regarding conditionals in +ginger. After more reflection, I think I'm going to stick with my _original_ +gut, which was to only have value and tuple vertices (no forks), and to use a +function which accepts both a boolean and two input edges: the first being the +one to take if the boolean is true, and the second being the one to take if it's +false. + +Aka, the very first proposal in the [first post][cond1]. It's hard to justify +up-front, but I think once you see it in action with a clean syntax you'll agree +it just kind of works. + +[cond1]: {% post_url 2021-03-01-conditionals-in-ginger %} +[cond2]: {% post_url 2021-03-04-conditionals-in-ginger-errata %} + +## Designing a Syntax + +Ginger is a bit of a strange language. It uses strange datastructures in strange +ways. But approaching the building of a syntax for any language is actually +straightforward: you're designing a serialization protocol. + +To pull back a bit, consider a list of words. How would you encode this list in +order to write it to a file? To answer this, let's flip the question: how would +you design a sequence of characters (ie the contents of the file) such that the +reader could reconstruct the list? + +Well, constructing the list from a sequence of characters requires being able to +construct it _at all_, so in what ways is the list constructed? For this list, +let's say there's only an append operation, which accepts a list and a value to +append to it, and returns the result. + +If we say that append is encoded via wrapping parenthesis around its two +arguments, and that `()` encodes the empty list, then we get a syntax like... + +``` +(((() foo) bar) baz) +``` + +...which, in this instance, decodes to a list containing the words, "foo", "bar", +and "baz", in that order. + +It's not a pretty syntax, but it demonstrates the method. If you know how the +datastructure is constructed via code, you know what capabilities the syntax must +have and how it needs to fit together. + +## gg + +All of this amounted to me needing to implement the ginger graph in some other +language, in order to see what features the syntax must have. + +A few years ago I had begun an implementation of a graph datastructure in go, to +use as the base (or at least a reference) for ginger. I had called this +implementation `gg` (ginger graph), with the intention that this would also be +the file extension used to hold ginger code (how clever). + +The basic qualities I wanted in a graph datastructure for ginger were, and still +are: + +* Immutability, ie all operations which modify the structure should return a + copy, leaving the original intact. + +* Support for tuples. + +* The property that it should be impossible to construct an invalid graph. An + invalid graph might be, for example, one where there is a single node with no + edges. + +* Well tested, and reasonably performant. + +Coming back to all this after a few years I had expected to have a graph +datastructure implemented, possibly with immutability, but lacking in tuples and +tests. As it turns out I completely underestimated my past self, because as far +as I can tell I had already finished the damn thing, tuples, tests and all. + +It looks like that's the point where I stopped, probably for being unsure about +some other aspect of the language, and my motivation fell off. The fact that +I've come back to ginger, after all these years, and essentially rederived the +same language all over again, gives me a lot of confidence that I'm on the right +track (and a lot of respect for my past self for having done all this work!) + +The basic API I came up with for building ginger graphs (ggs) looks like this: + +```go +package gg + +// OpenEdge represents an edge with a source value but no destination value, +// with an optional value on it. On its own an OpenEdge has no meaning, but is +// used as a building block for making Graphs. +type OpenEdge struct{ ... } + +// TupleOut constructs an OpenEdge leading from a tuple, which is comprised of +// the given OpenEdges leading into it, with an optional edge value. +func TupleOut(ins []OpenEdge, edgeVal Value) OpenEdge + +// ValueOut constructs an OpenEdge leading from a non-tuple value, with an +// optional edge value. +func ValueOut(val, edgeVal Value) OpenEdge + +// ZeroGraph is an empty Graph, from which all Graphs are constructed via calls +// to AddValueIn. +var ZeroGraph = &Graph{ ... } + +// Graph is an immutable graph structure, formed from a collection of edges +// between values and tuples. +type Graph struct{ ... } + +// AddValueIn returns a new Graph which is a copy of the original, with the +// addition of a new edge. The new edge's source and edge value come from the +// given OpenEdge, and the edge's destination value is the given value. +func (g *Graph) AddValueIn(oe OpenEdge, val Value) *Graph +``` + +The actual API is larger than this, and includes methods to remove edges, +iterate over edges and values, and perform unions and disjoins of ggs. But the +above are the elements which are required only for _making_ ggs, which is all +that a syntax is concerned with. + +As a demonstration, here is how one would construct the `min` operation, which +takes two numbers and returns the smaller, using the `gg` package: + +```go +// a, b, in, out, if, etc.. are Values which represent the respective symbol. + +// a is the result of passing in to the 0 operation, ie a is the 0th element of +// the in tuple. +min := gg.ZeroGraph.AddValueIn(gg.ValueOut(in, 0), a) + +// b is the 1st element of the in tuple +min = min.AddValueIn(gg.ValueOut(in, 1), b) + +// out is the result of an if which compares a and b together, and which returns +// the lesser. +min = min.AddValueIn(out, gg.TupleOut([]gg.OpenEdge{ + gg.TupleOut([]gg.OpenEdge{a, b}, lt), + a, + b, +}, if) +``` + +And here's a demonstration of how this `min` would be used: + +```go +// out is the result of passing 1 and 5 to the min operation. +gg.ZeroGraph.AddValueIn(gg.TupleOut([]gg.OpenEdge{1, 5}, min), out) +``` + +## Make it Nice + +_Technically_ we're done. We have an implementation of the language's underlying +structure, and a syntax which encodes it (ie the ugly ass go syntax above). But +obviously I'm not proposing anyone actually use that. + +Another thing I found when digging around in the old ginger repo was a text +file, tucked away in a directory called "sandbox", which had a primitive syntax +which _almost_ worked. I won't copy it here, but you can find it if you care to. +But with that as a foundation I came up with a crude, rough draft spec, which +maps the go syntax to the new syntax. + +``` +ValueOut(val, edgeVal) : -edgeVal-val +ValueOut(val, null) : -val +TupleOut([]val, edgeVal) : -edgeVal-(val, ...) +TupleOut([]val, null) : -(val, ...) +Graph(openEdge->val, ...) : { val openEdge, ... } +``` + +A couple things to note about this spec: + +* `null` is used to indicate absence of value on an edge. The details of `null` + are yet to be worked out, but we can use this placeholder for now. + +* `Graph` is cheating a bit. In the original `gg` implementation a Graph gains + its OpenEdge/Value pairs via successive calls to `AddValueIn`. However, such a + pattern doesn't translate well to text, and since we're dealing purely with + constructing an entire Graph at once we can instead have our Graph syntax + declare all OpenEdge/Value pairs at once. + +* It's backwards! Eg where the go syntax does `ValueOut(val, edgeVal)`, the + proposed spec puts the values in the opposite order: `-edgeVal-val`. The + former results in code which is read from input to output, while the latter + results in the opposite: output to input. + + This was a tip I picked up from the old text file I found, and the result is + code which is more familiar to an existing programmer. I _think_ (but am + not sure) that it's also more in line with how programming is done mentally, + ie we start with a result and work backwards to figure out what it takes to + get there. + + It's possible, though, that I'm wrong, so at this end of this post I'm going + to put some examples of the same code both "forwards" and "backwards" and see + how I feel about it. + +With all that said, let's see it in action! Here's `min` implemented in our shiny new syntax: + +``` +min -{ + a -0-in, + b -1-in, + out -if-( + -lt-(-a,-b), + -a, + -b + ) +} +``` + +and then here's it being used: + +``` +out -min-(-1,-5) +``` + +## Make it _Nicer_ + +The most striking feature of this rough draft spec is all the prefix dashes, +such as in the `-min-(-1,-5)` statement. These dashes were included as they make +sense in the context of what the intended human interpretation of the structure +is: two values, `1`, and `5`, are being _piped into_ the two slots of a 2-tuple, +and that 2-tuple is being _piped into_ the `min` operation, the output of which +is being _piped into_ something `out`. + +The "piping into" is what the dash represents, which is why the top level values +in the graph, `a`, `b`, and `out`, don't have a preceding dash; they are the +ultimate destinations of the pipes leading to them. But these pipes are +ultimately ugly, and also introduce odd questions like "how do we represent +-1?", so they need to go. + +So I've made a second draft, which is only a few changes away from the rough, +but oh man do those changes make a world of difference. Here's the cleaned up +spec: + +``` +ValueOut(val, edgeVal) : edgeVal(val) +ValueOut(val, null) : val +TupleOut([]val, edgeVal) : edgeVal(val, ...) +TupleOut([]val, null) : (val, ...) +Graph(openEdge->val, ...) : { val = openEdge, ... } +``` + +The dashes were simply removed, and the `edgeVal` and `val` concatted together. +For `ValueOut(val, edgeVal)` wrapping parenthesis were put around `val`, to +delineate it and `edgeVal`. This conflicts with the syntax for `TupleOut([]val, +edgeVal)`, but that conflict is easy to remedy: when parenthesis wrap only a +single `val` then that is a `ValueOut`, otherwise it's a `TupleOut`. + +Another change is to add an `=` between the `val` and `openEdge` in the `Graph` +constructor. This is a purely aesthetic change, but as you'll see it works well. + +So let's see it! `min` implemented with this cleaned up syntax: + +``` +min = { + a = 0(in), + b = 1(in), + out = if( + lt(a,b), + a, + b + ) +} +``` + +And then its use: + +``` +min(1,5) +``` + +Well well well, look what we have here: a conventional programming language +syntax! `{`/`}` wrap a scope, and `(`/`)` wrap function arguments and +(optionally) single values. It's a lot clearer now that `0` and `1` are being +used as operations themselves when instantiating `a` and `b`, and `if` is much +more readable. + +I was extremely surprised at how well this actually worked out. Despite having +drastically different underpinnings than most languages it ends up looking both +familiar and obvious. How cool! + +## Examples Examples Examples + +Here's a collection of example programs written in this new syntax. The base +structure of these are borrowed from previous posts, I'm merely translating that +structure into a new form: + +``` +// decr outputs one less than the input. +decr = { out = add(in, -1) } + +// fib accepts a number i, and outputs the ith fibonacci number. +fib = { + + inner = { + n = 0(in), + a = 1(in), + b = 2(in), + + out = if(zero?(n), + a, + inner(decr(n), b, add(a,b)) + ) + + }, + + out = inner(in, 0, 1) +} + +// map accepts a sequence and a function, and returns a sequence consisting of +// the result of applying the function to each of the elements in the given +// sequence. +map = { + inner = { + mapped-seq = 0(in), + orig-seq = 1(in), + op = 2(in), + + i = len(mapped-seq), + + // graphs provide an inherent laziness to the language. Just because + // next-el is _defined_ here doesn't mean it's evaluated here at runtime. + // In reality it will only be evaluated if/when evaluating out requires + // evaluating next-el. + next-el = op(i(orig-seq)), + next-mapped-seq = append(mapped-seq, next-el), + + out = if( + eq(len(mapped-seq), len(orig-seq)), + mapped-seq, + inner(next-mapped-seq, orig-seq, op) + ) + } + + // zero-seq returns an empty sequence + out = inner(zero-seq(), 0(in), 1(in)) +} +``` + +## Selpmexa Selpmexa Selpmexa + +Our syntax encodes a graph, and a graph doesn't really care if the syntax was +encoded in an input-to-output vs an output-to-input direction. So, as promised, +here's all the above examples, but "backwards": + +``` +// min returns the lesser of the two numbers it is given +{ + (in)0 = a, + (in)1 = b, + + ( + (a,b)lt, + a, + b + )if = out + +} = min + +// decr outputs one less than the input. +{ (in, -1)add = out } = decr + +// fib accepts a number i, and outputs the ith fibonacci number. +{ + { + (in)0 = n, + (in)1 = a, + (in)2 = b, + + ( + (n)zero? + a, + ((n)decr, b, (a,b)add)inner + )if = out + + } = inner, + + (in, 0, 1)inner = out + +} = fib + +// map accepts a sequence and a function, and returns a sequence consisting of +// the result of applying the function to each of the elements in the given +// sequence. +{ + { + (in)0 = mapped-seq, + (in)1 = orig-seq, + (in)2 = op, + + (mapped-seq)len = i, + + ((orig-seq)i)op = next-el, + (mapped-seq, next-el)append = next-mapped-seq, + + ( + ((mapped-seq)len, (orig-seq)len)eq, + mapped-seq, + (next-mapped-seq, orig-seq, op)inner + )if = out + + } = inner, + + (()zero-seq, (in)0, (in)1)inner = out +} = map +``` + +Do these make you itchy? They kind of make me itchy. But... parts of them also +appeal to me. + +The obvious reason why these feel wrong to me is the placement of `if`: + +``` +( + (a,b)lt, + a, + b +)if = out +``` + +The tuple which is being passed to `if` here is confusing unless you already +know that it's going to be passed to `if`. But on your first readthrough you +won't know that till you get to the end, so you'll be in the dark until then. +For more complex programs I'm sure this problem will compound. + +On the other hand, pretty much everything else looks _better_, imo. For example: + +``` +// copied and slightly modified from the original to make it even more complex + +(mapped-seq, ((orig-seq)i)op)append = next-mapped-seq +``` + +Something like this reads very clearly to me, and requires a lot less mental +backtracking to comprehend. The main difficulty I have is tracking the +parenthesis, but the overall "flow" of data and the order of events is plain to +read. + +## Next Steps + +The syntax here is not done yet, not by a long shot. If my record with past +posts about ginger (wherein I've "decided" on something and then completely +backtracked in later posts every single time) is any indication then this syntax +won't even look remotely familiar in a very short while. But it's a great +starting point, I think, and raises a lot of good questions. + +* Can I make parenthesis chains, a la the last example, more palatable in some + way? + +* Should I go with the "backwards" syntax afterall? In a functional style of + programming `if` statements _should_ be in the minority, and so the syntax + which better represents the flow of data in that style might be the way. + +* Destructuring of tuples seems to be wanted, as evidenced by all the `a = + 0(in)` lines. Should this be reflected in the structure or solely be + syntactical sugar? + +* Should the commas be replaced with any whitespace (and make commas count as + whitespace, as clojure has done)? If this is possible then I think they should + be, but I won't know for sure until I begin implementing the parser. + +And, surely, many more! I've felt a bit lost with ginger for a _long_ time, but +seeing a real, usable syntax emerge has really invigorated me, and I'll be +tackling it again in earnest soon (fingers crossed).