Ginger!

2021-01-09 11:21:29 -07:00 · 2021-01-09 11:21:29 -07:00 · 65c1d5ab10
commit 65c1d5ab10
parent fccc5b95fd
1 changed files with 352 additions and 0 deletions
--- a/_posts/2021-01-09-ginger.md
+++ b/_posts/2021-01-09-ginger.md
@ -0,0 +1,352 @@
+---
+title: >-
+    Ginger
+description: >-
+    Yes, it does exist.
+---
+
+This post is about a programming language that's been bouncing around in my head
+for a _long_ time. I've tried to actually implement the language three or more
+times now, but everytime I get stuck or run out of steam. It doesn't help that
+everytime I try again the form of the language changes significantly. But all
+throughout the name of the language has always been "Ginger". It's a good name.
+
+In the last few years the form of the language has somewhat solidified in my
+head, so in lieu of actually working on it I'm going to talk about what it
+currently looks like.
+
+## Abstract Syntax Lists
+
+_In the beginning_ there was assembly. Well, really in the beginning there were
+punchcards, and probably something even more esoteric before that, but it was
+all effectively the same thing: a list of commands the computer would execute
+sequentially, with the ability to jump to odd places in the sequence depending
+on conditions at runtime. For the purpose of this post, we'll call this class of
+languages "abstract syntax list" (ASL) languages.
+
+Here's a hello world program in my favorite ASL language, brainfuck:
+
+```
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.++
+.------.--------.>>+.>++.
+```
+
+(If you've never seen brainfuck, it's deliberately unintelligible. But it _is_
+an ASL, each character representing a single command, executed by the brainfuck
+runtime from left to right.)
+
+ASLs did the job at the time, but luckily we've mostly moved on past them.
+
+## Abstract Syntax Trees
+
+Eventually programmers upgraded to C-like languages. Rather than a sequence of
+commands, these languages were syntactically represented by an "abstract syntax
+tree" (AST). Rather than executing commands in essentially the same order they
+are written, an AST language compiler reads the syntax into a tree of syntax
+nodes. What it then does with the tree is language dependent.
+
+Here's a program which outputs all numbers from 0 to 9 to stdout, written in
+(slightly non-idiomatic) Go:
+
+```go
+i := 0
+for {
+    if i == 10 {
+        break
+    }
+    fmt.Println(i)
+    i++
+}
+```
+
+When the Go compiler sees this, it's going to first parse the syntax into an
+AST. The AST might look something like this:
+
+```
+(root)
+   |-(:=)
+   |   |-(i)
+   |   |-(0)
+   |
+   |-(for)
+       |-(if)
+       |  |-(==)
+       |  |  |-(i)
+       |  |  |-(10)
+       |  |
+       |  |-(break)
+       |
+       |-(fmt.Println)
+       |       |-(i)
+       |
+       |-(++)
+           |-(i)
+```
+
+Each of the non-leaf nodes in the tree represents an operation, and the children
+of the node represent the arguments to that operation, if any. From here the
+compiler traverses the tree depth-first in order to turn each operation it finds
+into the appropriate machine code.
+
+There's a sub-class of AST languages called the LISP ("LISt Processor")
+languages. In a LISP language the AST is represented using lists of elements,
+where the first element in each list denotes the operation and the rest of the
+elements in the list (if any) represent the arguments. Traditionally each list
+is represented using parenthesis. For example `(+ 1 1)` represents adding 1 and
+1 together.
+
+As a more complex example, here's how to print numbers 0 through 9 to stdout
+using my favorite (and, honestly, only) LISP, Clojure:
+
+```clj
+(doseq
+    [n (range 10)]
+    (println n))
+```
+
+Much smaller, but the idea is there. In LISPs there is no differentiation
+between the syntax, the AST, and the language's data structures; they are all
+one and the same. For this reason LISPs generally have very powerful macro
+support, wherein one uses code written in the language to transform code written
+in that same language. With macros users can extend a language's functionality
+to support nearly anything they need to, but because macro generation happens
+_before_ compilation they can still reap the benefits of compiler optimizations.
+
+### AST Pitfalls
+
+The ASL (assembly) is essentially just a thin layer of human readability on top
+of raw CPU instructions. It does nothing in the way of representing code in the
+way that humans actually think about it (relationships of types, flow of data,
+encapsulation of behavior). The AST is a step towards expressing code in human
+terms, but it isn't quite there in my opinion. Let me show why by revisiting the
+Go example above:
+
+```go
+i := 0
+for {
+    if i > 9 {
+        break
+    }
+    fmt.Println(i)
+    i++
+}
+```
+
+When I understand this code I don't understand it in terms of its syntax. I
+understand it in terms of what it _does_. And what it does is this:
+
+* with a number starting at 0, start a loop.
+* if the number is greater than 9, stop the loop.
+* otherwise, print the number.
+* add one to the number.
+* go to start of loop.
+
+This behavior could be further abstracted into the original problem statement,
+"it prints numbers 0 through 9 to stdout", but that's too general, as there
+are different ways for that to be accomplished. The Clojure example first
+defines a list of numbers 0 through 9 and then iterates over that, rather than
+looping over a single number. These differences are important when understanding
+what code is doing.
+
+So what's the problem? My problem with ASTs is that the syntax I've written down
+does _not_ reflect the structure of the code or the flow of data which is in my
+head. In the AST representation if you want to follow the flow of data (a single
+number) you _have_ to understand the semantic meaning of `i` and `:=`; the AST
+structure itself does not convey how data is being moved or modified.
+Essentially, there's an extra implicit transformation that must be done to
+understand the code in human terms.
+
+## Ginger: An Abstract Syntax Graph Language
+
+In my view the next step is towards using graphs rather than trees for
+representing our code. A graph has the benefit of being able to reference
+"backwards" into itself, where a tree cannot, and so can represent the flow of
+data much more directly.
+
+I would like Ginger to be an ASG language where the language is the graph,
+similar to a LISP. But what does this look like exactly? Well, I have a good
+idea about what the graph _structure_ will be like and how it will function, but
+the syntax is something I haven't bothered much with yet. Representing graph
+structures in a text file is a problem to be tackled all on its own. For this
+post we'll use a made-up, overly verbose, and probably non-usable syntax, but
+hopefully it will convey the graph structure well enough.
+
+### Nodes, Edges, and Tuples
+
+All graphs have nodes, where each node contains a value. A single unique value
+can only have a single node in a graph. Nodes are connected by edges, where
+edges have a direction and can contain a value themselves.
+
+In the context of Ginger, a node represents a value as expected, and the value
+on an edge represents an operation to take on that value. For example:
+
+```
+5 -incr-> n
+```
+
+`5` and `n` are both nodes in the graph, with an edge going from `5` to `n` that
+has the value `incr`. When it comes time to interpret the graph we say that the
+value of `n` can be calculated by giving `5` as the input to the operation
+`incr` (increment). In other words, the value of `n` is `6`.
+
+What about operations which have more than one input value? For this Ginger
+introduces the tuple to its graph type. A tuple is like a node, except that it's
+anonymous, which allows more than one to exist within the same graph, as they do
+not share the same value. For the purposes of this blog post we'll represent
+tuples like this:
+
+```
+1 -> } -add-> t
+2 -> }
+```
+
+`t`'s value is the result of passing a tuple of two values, `1` and `2`, as
+inputs to the operation `add`. In other words, the value of `t` is `3`.
+
+For the syntax being described in this post we allow that a single contiguous
+graph can be represented as multiple related sections. This can be done because
+each node's value is unique, so when the same value is used in disparate
+sections we can merge the two sections on that value. For example, the following
+two graphs are exactly equivalent (note the parenthesis wrapping the graph which
+has been split):
+
+```
+1 -> } -add-> t -incr-> tt
+2 -> }
+```
+
+```
+(
+    1 -> } -add-> t
+    2 -> }
+
+    t -incr-> tt
+)
+```
+
+(`tt` is `4` in both cases.)
+
+A tuple with only one input edge, a 1-tuple, is a no-op, semantically, but can
+be useful structurally to chain multiple operations together without defining
+new value names. In the above example the `t` value can be eliminated using a
+1-tuple.
+
+```
+1 -> } -add-> } -incr-> tt
+2 -> }
+```
+
+When an integer is used as an operation on a tuple value then the effect is to
+output the value in the tuple at that index. For example:
+
+```
+1 -> } -0-> } -incr-> t
+2 -> }
+```
+
+(`t` is `2`.)
+
+### Operations
+
+When a value sits on an edge it is used as an operation on the input of that
+edge. Some operations will no doubt be builtin, like `add`, but users should be
+able to define their own operations. This can be done using the `in` and `out`
+special values. When a graph is used as an operation it is scanned for both `in`
+and `out` values. `in` is set to the input value of the operation, and the value
+of `out` is used as the output of the operation.
+
+Here we will define the `incr` operation and then use it. Note that we set the
+`incr` value to be an entire sub-graph which represents the operation's body.
+
+```
+( in -> } -add-> out
+   1 -> }            ) -> incr
+
+5 -incr-> n
+```
+
+(`n` is `6`.)
+
+The output of an operation may itself be a tuple. Here's an implementation and
+usage of `double-incr`, which increments two values at once.
+
+```
+( in -0-> } -incr-> } -> out
+                    }
+  in -1-> } -incr-> }        ) -> double-incr
+
+1 -> } -double-incr-> t -add-> tt
+2 -> }
+```
+
+(`t` is a 2-tuple with values `2`, and `3`, `tt` is `5.)
+
+### Conditionals
+
+The conditional is a bit weird, and I'm not totally settled on it yet. For now
+we'll use this. The `if` operation expects as an input a 2-tuple whose first
+value is a boolean and whose second value will be passed along. The `if`
+operation is special in that it has _two_ output edges. The first will be taken
+if the boolean is true, the second if the boolean is false. The second value in
+the input tuple, the one to be passed along, is used as the input to whichever
+branch is taken.
+
+Here is an implementation and usage of `max`, which takes two numbers and
+outputs the greater of the two. Note that the `if` operation has two output
+edges, but our syntax doesn't represent that very cleanly.
+
+```
+( in -gt-> } -if-> } -0-> out
+     in -> }    -> } -1-> out ) -> max
+
+1 -> } -max-> t
+2 -> }
+```
+
+(`t` is `2`.)
+
+It would be simple enough to create a `switch` macro on top of `if`, to allow
+for multiple conditionals to be tested at once.
+
+### Loops
+
+Loops are tricky, and I have two thoughts about how they might be accomplished.
+One is to literally draw an edge from the right end of the graph back to the
+left, at the point where the loop should occur, as that's conceptually what's
+happening. But representing that in a text file is difficult. For now I'll
+introduce the special `recur` value, and leave this whole section as TBD.
+
+`recur` is cousin of `in` and `out`, in that it's a special value and not an
+operation.  It takes whatever value it's set to and calls the current operation
+with that as input. As an example, here is our now classic 0 through 9 printer
+(assume `println` outputs whatever it was input):
+
+```
+// incr-1 is an operation which takes a 2-tuple and returns the same 2-tuple
+// with the first element incremented.
+( in -0-> } -incr-> } -> out
+            in -1-> }        ) -> incr-1
+
+( in -eq-> } -if-> out
+     in -> }    -> } -0-> } -println-> } -incr-1-> } -> recur ) -> print-range
+
+0  -> } -print-range-> }
+10 -> }
+```
+
+## Next Steps
+
+This post is long enough, and I think gives at least a basic idea of what I'm
+going for. The syntax presented here is _extremely_ rudimentary, and is almost
+definitely not what any final version of the syntax would look like. But the
+general idea behind the structure is sound, I think.
+
+I have a lot of further ideas for Ginger I haven't presented here. Hopefully as
+time goes on and I work on the language more some of those ideas can start
+taking a more concrete shape and I can write about them.
+
+The next thing I need to do for Ginger is to implement (again) the graph type
+for it, since the last one I implemented didn't include tuples. Maybe I can
+extend it instead of re-writing it. After that it will be time to really buckle
+down and figure out a syntax. Once a syntax is established then it's time to
+start on the compiler!