stz parsing
Before we can write an interpreter or compiler we need to be able to parse the syntax of stz.
The rules are fairly simple so far. There's a few characters that are special:
start-bracket (
stop-bracket )
start-block [
stop-block ]
start-make {
stop-make }
divider |
return ^
string-quote ' " `
reference &
keyword-break :
comment-start /*
comment-stop */
comment-line //
comma ,
assignment =
Then there's the numbers:
integer -0123456789 123456789
binary 0b01010101
octal -0o07070707 0o07070707
hexadecimal -0x0F0F0F0F 0x0F0F0F0F
float -123456.789 123456.789
There's likely more to be done with strings to include interpolation and escape codes but that can wait and likely has nothing to do with the actual syntax of the language but instead meta-programming in the string library.
If we can parse identifiers and method signatures then we have all the building blocks of the language.
identifier [a-zA-Z~][a-zA-Z0-9-_!?¿]*
It's a very permissive language so far. I'm not sure that's the wisest of ideas. I guess we'll see soon enough.
keyword [a-zA-Z][a-zA-Z0-9-!@#$%*_+/\<>~]*
prefix [!@#$%~]
unary identifier
binary [*_+÷/\<>≤≥] prefix*
That might be close enough to get something parsing. Let's see if we can define some of our structures:
expression
method-call
assignment-expression
method-call
selector:prefix receiver:identifier?
receiver:identifier? selector:unary
receiver:identifier? selector:binary argument:sub-expressio
receiver:identifier? (selector:unary keyword-break argument:sub-expression)+
sub-expresion
literal
object-make
open-bracket expression close-bracket
literal
literal-string
literal-number
literal-array
literal-map
literal-structure
literal-enumeration
assignment-expression
variable:identifier assignment one-expression (comma assignment-expression)*
method-declaration
signature:method-call separator types:literal-array-body separator statements
literal-array
open-bracket (separator type-expression)? literal-array-body close-bracket
literal-array-body
nothing
(type-expression comma)+
This is enough of an exploration to see that parsing the language isn't ambiguous. No fancy back tracking is required. There's some guess work with ( ... ) and [ ... ] but the moment we see what's between the |'s we know what we're dealing with.
Identifiers tend to be described by what they can't have rather than what they do have.
I have half a mind to spend some of the weekend trying to implement the parser. The interpreter would come next. The interpreter would be enough to make a compiler and then the compiler could compile itself. That's always a fun test when making a programming language.