stz the compiler

stz the compiler
Photo by Kai Dahms / Unsplash

It's time to talk about the compiler. The compiler has to provide a minimum level of capability so that the rest of the code can construct itself. Without the compiler there is no point to Smalltalk Zero.

(Part of me does wonder if stz can eventually elevate itself to a full Smalltalk environment with garbage collection and development tools - but another part of me says that already exists and may be that doesn't need to be reinvented)

The compiler is, amusingly, an interpreter. It needs to be able to run stz code as part of the process of constructing the executable. It needs to be able to parse the stz code, run the appropriate bits of the stz code (quickly), and produce an output to one of the backends.

Only some of the code needs to be executed while the rest needs to be translated to the backend. We need to determine which parts of the program run at compile time and which parts run at runtime.

Anything in the top level scope of the file runs at compile time. Anything in the types section of a block statement, make statement, or list/map statement runs at compile time.

The order of execution matters too. Do we scan for all the types and methods first before running any other code? or do we run the code expecting ordering to be important to the developer. I'm going to go with the latter to begin with and if it becomes too troublesome because of circular references then we'll revisit this decision.

The interpreter will be a simple stack machine. There's little reason to do anything particularly fancy here so long as its fast (which likely means open coded). It doesn't need to do any FFI because the source code to the compiler can always been extended and the runtime is compiled so it has FFI built in to it anyway.

There are certain platform constants that need to be captured during compilation too - such as all the error constants from operating system calls. The interpreter will need to know how to use the base types. This is a bit of double-up work right here because the libraries for, say, adding two floats together will use the backend to do it; but the interpreter has to simulate it so it too needs to have an implementation. Every primitive we pass to the backend has to be implemented in the interpreter.

That means the interpreter is... a backend. We gather up all the code we want to run and we pass it to the interpreter backend and say 'run' and it produces a new program which we pass to a different backend (or the same one? if we're being lazy?) to produce the final output.

sources input → parser → gather compile-time code → interpreter backend →
execute compile time → gather runtime code → compilation backend → program

The interpreter backend code execution needs to make something that can be fed to a backend. Part of that information will come from the parser and the remainder from the execution. They therefore share a 'compilation unit' that is the whole program. The interpreted code can add to and amend the compilation unit as necessary.

A set of structures, enums, and blocks are all that comprise the actual program. It is a shockingly simple language so far. Does the interpreter need to be able to parse things? if so it can call the compilers parser - if not, it can construct a parse tree if necessary.

It will be necessary. We want to do compile time magic to make our lives easier which means generating code from a developer's hint. Making parse tree objects might be easier than string creation to give to a parser. But that facility will only be useful at compile time while string manipulation will be useful at both compile and runtime.

The best way to tell will be to write some of the code to see how it looks in both forms.