Been thinking about the stack and heap a lot. It would be possible, though possibly painful, to enforce a language with no global heap. The question really is: what are the principles which give reason to do so? What are the principles of this language, period? The principles are different than the use-cases. They don't need to be logically rigorous (at first anyway). ########## I need to prioritize the future of this project a bit more. I've been thinking I'm going to figure this thing out at this level, but I shouldn't even be working here without a higher level view. I can't finish this project without financial help. I don't think I can get a v0 up without financial help. What this means at minimum, no matter what, I'm going to have to: - Develop a full concept of the language that can get it to where I want to go - Figure out where I want it to go - Write the concept into a manifesto of the language - Write the concept into a proposal for course of action to take in developing the language further I'm unsure about what this language actually is, or is actually going to look like, but I'm sure of those things. So those are the lowest hanging fruit, and I should start working on them pronto. It's likely I'll need to experiment with some ideas which will require coding, and maybe even some big ideas, but those should all be done under the auspices of developing the concepts of the language, and not the compiler of the language itself. ######### Elemental types: * Tuples * Arrays * Integers ######### Been doing thinking and research on ginger's elemental types and what their properties should be. Ran into roadblock where I was asking myself these questions: * Can I do this without atoms? * What are different ways atoms can be encoded? * Can I define language types (elementals) without defining an encoding for them? I also came up with two new possible types: * Stream, effectively an interface which produces discreet packets (each has a length), where the production of one packet indicates the size of the next one at the same time. * Tagged, sort of like a stream, effectively a type which says "We don't know what this will be at compile-time, but we know it will be prefixed with some kind of tag indicating its type and size. * Maybe only the size is important * Maybe precludes user defined types that aren't composites of the elementals? Maybe that's ok? Ran into this: https://www.ps.uni-saarland.de/~duchier/python/continuations.htm://www.ps.uni-saarland.de/~duchier/python/continuations.html https://en.wikipedia.org/wiki/Continuation#First-class_continuations which is interesting. A lot of my problems now are derived from stack-based systems and their need for knowing the size input and output data, continuations seem to be an alternative system? I found this: http://lambda-the-ultimate.org/node/4512 I don't understand any of it, I should definitely learn feather I should finish reading this: http://www.blackhat.com/presentations/bh-usa-07/Ferguson/Whitepaper/bh-usa-07-ferguson-WP.pdf ######### Ok, so I'm back at this for the first time in a while, and I've got a good thing going. The vm package is working out well, Using tuples and atoms as the basis of a language is pretty effective (thanks erlang!). I've got basic variable assignment working as well. No functions yet. Here's the things I still need to figure out or implement: * lang * constant size arrays * using them for a "do" macro * figure out constant, string, int, etc... look at what erlang's actual primitive types are for a hint * figure out all needed macros for creating and working with lang types * vm * figure out the differentiation between compiler macros and runtime calls * probably separate the two into two separate call systems * the current use of varCtx is still pretty ugly, the do macro might help clean it up * functions * are they a primitive? I guess so.... * declaration and type * variable deconstruction * scoping/closures * compiler macros, need vm's Run to output a lang.Term * need to learn about linking * figure out how to include llvm library in compiled binary and make it callable. runtime macros will come from this * linking in of other ginger code? or how to import in general * comiler, a general purpose binary for taking ginger code and turning it into machine code using the vm package * swappable syntax, including syntax-dependent macros * close the loop? ############ I really want contexts to work. They _feel_ right, as far as abstractions go. And they're clean, if I can work out the details. Just had a stupid idea, might as well write it down though. Similar to how the DNA and RNA in our cells work, each Context is created with some starting set of data on it. This will be the initial protein block. Based on the data there some set of Statements (the RNA) will "latch" on and do whatever work they're programmed to do. That work could include making new Contexts and "releasing" them into the ether, where they would get latched onto (or not). There's so many problems with this idea, it's not even a little viable. But here goes: * Order of execution becomes super duper fuzzy. It would be really difficult to think about how your program is actually going to work. * Having Statement sets just latch onto Contexts is super janky. They would get registered I guess, and it would be pretty straightforward to differentiate one Context from another, but what about conflicts? If two Statements want to latch onto the same Context then what? If we wanted to keep the metaphor one would just get randomly chosen over the other, but obviously that's insane. ############ I explained some of this to ibrahim already, but I might as well get it all down, cause I've expanded on it a bit since. Basically, ops (functions) are fucking everything up. The biggest reason for this is that they are really really hard to implement without a type annotation system. The previous big braindump is about that, but basically I can't figure out a way that feels clean and good enough to be called a "solution" to type inference. I really don't want to have to add type annotations just to support functions, at least not until I explore all of my options. The only other option I've come up with so far is the context thing. It's nice because it covers a lot of ground without adding a lot of complexity. Really the biggest problem with it is it doesn't allow for creating new things which look like operations. Instead, everything is done with the %do operator, which feels janky. One solution I just thought of is to get rid of the %do operator and simply make it so that a list of Statements can be used as the operator in another Statement. This would _probably_ allow for everything that I want to do. One outstanding problem I'm facing is figuring out if all Statements should take a Context or not. * If they did it would be a lot more explicit what's going on. There wouldn't be an ethereal "this context" that would need to be managed and thought about. It would also make things like using a set of Statements as an operator a lot more straightforward, since without Contexts in the Statement it'll be weird to "do" a set of Statements in another Context. * On the other hand, it's quite a bit more boilerplate. For the most part most Statements are going to want to be run in "this" context. Also this wouldn't really decrease the number of necessary macros, since one would still be needed in order to retrieve the "root" Context. * One option would be for a Statement's Context to be optional. I don't really like this option, it makes a very fundamental datatype (a Statement) a bit fuzzier. * Another thing to think about is that I might just rethink how %bind works so that it doesn't operate on an ethereal "this" Context. %ctxbind is one attempt at this, but there's probably other ways. * One issue I just thought of with having a set of Statements be used as an operator is that the argument to that Statement becomes.... weird. What even is it? Something the set of Statements can access somehow? Then we still need something like the %in operator. Let me backtrack a bit. What's the actual problem? The actual thing I'm struggling with is allowing for code re-use, specifically pure functions. I don't think there's any way anyone could argue that pure functions are not an effective building block in all of programming, so I think I can make that my statement of faith: pure functions are good and worthwhile, impure functions are.... fine. Implementing them, however, is quite difficult. Moreso than I thought it would be. The big inhibitor is the method by which I actually pass input data into the function's body. From an implementation standpoint it's difficult because I *need* to know how many bytes on the stack the arguments take up. From a syntax standpoint this is difficult without a type annotation system. And from a usability standpoint this is difficult because it's a task the programmer has to do which doesn't really have to do with the actual purpose or content of the function, it's just a book-keeping exercise. So the stack is what's screwing us over here. It's a nice idea, but ultimately makes what we're trying to do difficult. I'm not sure if there's ever going to be a method of implementing pure functions that doesn't involve argument/return value copying though, and therefore which doesn't involve knowing the byte size of your arguments ahead of time. It's probably not worth backtracking this much either. For starters, cpus are heavily optimized for stack based operations, and much of the way we currently think about programming is also based on the stack. It would take a lot of backtracking if we ever moved to something else, if there even is anything else worth moving to. If that's the case, how is the stack actually used then? * There's a stack pointer which points at an address on the stack, the stack being a contiguous range of memory addresses. The place the stack points to is the "top" of the stack, all higher addresses are considered unused (no matter what's in them). All the values in the stack are available to the currently executing code, it simply needs to know either their absolute address or their relative position to the stack pointer. * When a function is "called" the arguments to it are copied onto the top of the stack, the stack pointer is increased to reflect the new stack height, and the function's body is jumped to. Inside the body the function need only pop values off the stack as it expects them, as long as it was called properly it doesn't matter how or when the function was called. Once it's done operating the function ensures all the input values have been popped off the stack, and subsequently pushes the return values onto the stack, and jumps back to the caller (the return address was also stored on the stack). That's not quite right, but it's close enough for most cases. The more I'm reading about this the more I think it's not going to be worth it to backtrack passed the stack. There's a lot of compiler and machine specific crap that gets involved at that low of a level, and I don't think it's worth getting into it. LLVM did all of that for me, I should learn how to make use of that to make what I want happen. But what do I actually want? That's the hard part. I guess I've come full circle. I pretty much *need* to use llvm functions. But I can't do it without declaring the types ahead of time. Ugghh. ################################ So here's the current problem: I have the concept of a list of statements representing a code block. It's possible/probable that more than this will be needed to represent a code block, but we'll see. There's two different ways I think it's logical to use a block: * As a way of running statements within a new context which inherits all of its bindings from the parent. This would be used for things like if statements and loops, and behaves the way a code block behaves in most other languages. * To define a operator body. An operator's body is effectively the same as the first use-case, except that it has input/output as well. An operator can be bound to an identifier and used in any statement. So the hard part, really, is that second point. I have the first done already. The second one isn't too hard to "fake" using our current context system, but it can't be made to be used as an operator in a statement. Here's how to fake it though: * Define the list of statements * Make a new context * Bind the "input" bindings into the new context * Run %do with that new context and list of statements * Pull the "output" bindings out of that new context And that's it. It's a bit complicated but it ultimately works and effectively inlines a function call. It's important that this looks like a normal operator call though, because I believe in guy steele. Here's the current problems I'm having: * Defining the input/output values is the big one. In the inline method those were defined implicitly based on what the statements actually use, and the compiler would fail if any were missing or the wrong type. But here we ideally want to define an actual llvm function and not inline everytime. So we need to somehow "know" what the input/output is, and their types. * The output value isn't actually *that* difficult. We just look at the output type of the last statement in the list and use that. * The input is where it gets tricky. One idea would be to use a statement with no input as the first statement in the list, and that would define the input type. The way macros work this could potentially "just work", but it's tricky. * It would also be kind of difficult to make work with operators that take in multiple parameters too. For example, `bind A, 1` would be the normal syntax for binding, but if we want to bind an input value it gets weirder. * We could use a "future" kind of syntax, like `bind A, _` or something like that, but that would requre a new expression type and also just be kind of weird. * We could have a single macro which always returns the input, like `%in` or something. So the bind would become `bind A, %in` or `bind (A, B), %in` if we ever get destructuring. This isn't a terrible solution, though a bit unfortunate in that it could get confusing with different operators all using the same input variable effectively. It also might be a bit difficult to implement, since it kind of forces us to only have a single argument to the LLVM function? Hard to say how that would work. Possibly all llvm functions could be made to take in a struct, but that would be ghetto af. Not doing a struct would take a special interaction though.... It might not be possible to do this without a struct =/ * Somehow allowing to define the context which gets used on each call to the operator, instead of always using a blank one, would be nice. * The big part of this problem is actually the syntax for calling the operator. It's pretty easy to have this handled within the operator by the %thisctx macro. But we want the operator to be callable by the same syntax as all other operator calls, and currently that doesn't have any way of passing in a new context. * Additionally, if we're implementing the operator as an LLVM function then there's not really any way to pass in that context to it without making those variables global or something, which is shitty. * So writing all this out it really feels like I'm dealing with two separate types that just happen to look similar: * Block: a list of statements which run with a variable context. * Operator: a list of statements which run with a fixed (empty?) context, and have input/output. * There's so very nearly a symmetry there. Things that are inconsistent: * A block doesn't have input/output * It sort of does, in the form of the context it's being run with and %ctxget, but not an explicit input/output like the operator has. * If this could be reconciled I think this whole shitshow could be made to have some consistency. * Using %in this pretty much "just works". But it's still weird. Really we'd want to turn the block into a one-off operator everytime we use it. This is possible. * An operator's context must be empty * It doesn't *have* to be, defining the ctx which goes with the operator could be part of however an operator is created. * So after all of that, I think operators and blocks are kind of the same. * They both use %in to take in input, and both output using the last statement in their list of statements. * They both have a context bound to them, operators are fixed but a block changes. * An operator is a block with a bound context. ##############@@@@@@@@@#$%^&^%$#@#$%^&* * New problem: type inference. LLVM requires that a function's definition have the type specified up-front. This kind of blows. Well actually, it blows a lot more than kind of. There's two things that need to be infered from a List of Statements then: the input type and the output type. There's two approaches I've thought of in the current setup. * There's two approaches to determining the type of an operator: analyze the code as ginger expressions, or build the actual llvm structures and analyze those. * Looking at the ginger expressions is definitely somewhat fuzzy. We can look at all the statements and sub-statements until we find an instance of %in, then look at what that's in input into. But if it's simply binding into an Identifier then we have to find the identifier. If it's destructuring then that gets even *more* complicated. * Destructuring is what really makes this approach difficult. Presumably there's going to be a function that takes in an Identifier (or %in I guess?) and a set of Statements and returns the type for that Identifier. If we find that %in is destructured into a tuple then we would run that function for each constituent Identifier and put it all together. But then this inference function is really coupled to %bind, which kind of blows. Also we may one day want to support destructuring into non-tuples as well, which would make this even harder. * We could make it the job of the macro definition to know its input and output types, as well as the types of any bindings it makes. That places some burden on user macros in the future, but then maybe it can be inferred for user macros? That's a lot of hope. It would also mean the macro would need the full set of statements that will ever run in the same Context as it, so it can determine the types of any bindings it makes. * The second method is to build the statements into LLVM structures and then look at those structures. This has the benefit of being non-ambiguous once we actually find the answer. LLVM is super strongly typed, and re-iterates the types involved for every operation. So if the llvm builder builds it then we need only look for the first usage of every argument/return and we'll know the types involved. * This requires us to use structs for tuples, and not actually use multiple arguments. Otherwise it won't be possible to know the difference between a 3 argument function and a 4 argument one which doesn't use its 4th argument (which shouldn't really happen, but could). * The main hinderence is that the llvm builder is really not designed for this sort of thing. We could conceivably create a "dummy" function with bogus types and write the body, analyze the body, erase the function, and start over with a non-dummy function. But it's the "analyze the body" step that's difficult. It's difficult to find the types of things without the llvm.Value objects in hand, but since building is set up as a recursive process that becomes non-trivial. This really feels like the way to go though, I think it's actually doable. * This could be something we tack onto llvmVal, and then make Build return extra data about what types the Statements it handled input and output. * For other setups that would enable this a bit better, the one that keeps coming to mind is a more pipeline style system. Things like %bind would need to be refactored from something that takes a Tuple to something that only takes an Identifier and returns a macro which will bind to that Identifier. This doesn't *really* solve the type problem I guess, since whatever is input into the Identifier's bind doesn't necessarily have a type attached to it. Sooo yeah nvm.