stz closures
Systems level programming and closures. Look, it's not really a Smalltalk if we don't have closures. But that doesn't mean we can't be more specific about how it works.
using: 'core/memory'
export:
[ main |
memory allocate: (fixed-array size: 10 of: u8)
]
We're referencing memory from the compilation-scope-context. Now, you could say that this is a special case. And may be that's true. But is that the right way to think about it? What is 'self' in the Smalltalk parlance here. When we write code like: [ a dot-product: b | a x * b x + a y * b y ] there is no 'self'.
But in object creation we do have an implicit self: { person | name: 'Jane' } here the implicit receiver is the new person we've just made. We may want to be able to nominate an implicit receiver - or perhaps we assume the first parameter is the default one. For simplicities sake we're going to go with that for now.
So no, our implicit receiver is not the compilation-scope-context. We're also not closing over it. It is special and is a compilation facility.
Now let's look at things actually being closed over:
[ main |
a = 123
[ add_one | a += 1 ]
add_one
add_one
add_one
]
We've just created global variables - well, package level variables. By moving it out of the main method it'd be global. It would be thread-global too.
May be we need to explicitly state what we're closing over though and how we're doing it. Do we want to copy it or do we want to reference it. If we reference it when the stack frame completes the reference is invalid. Borrow-checkers can be very useful for catching that kind of bug. That is not the topic of this blog post.
Either we have a borrow checker to catch that we've captured something that has gone out of scope or we make the developer explicitly state what it is they want to capture. We're going to need a syntax for either scenario.
Borrow checkers are a big deal so let's go without one for now. Therefore we need to a way to specify what we're closing over. Let's try this:
[ add_one, a | a += 1 ]
Hey no one ever said the signature couldn't also be a list. The astute might realise right now we have a problem. It's impossible to tell if this is a method or an anonymous code block. We're going to need to fix this.
The forms of method declarations we have right now are:
[ signature | code ]
[ signature | types | code ]
[ signature, closed_variables... | types | code ]
The forms of anonymous code blocks we have right now are:
[ code ]
[ types | code ]
[ closed_variables | code ]
[ closed_variables | types | code ]
Signature is very identifiable so long as it isn't a single word. closed_variables is also identifiable as it must name variables that exist in scope.
It turns out we don't have any conflicts. So long as you aren't allowed to shadow names then add_one cannot be a type or a variable to close over. There are no problems with add_one, a because add_one can't exist yet and a must exist.
In a previous blog post I used shadowing. Let's revisit that example.
thing := get-the-thing
thing else: [ no-thing ]
thing then: [ thing | do-the-thing: thing ] // thing is shadow unwrapped thing
If we allow ? as part of variable names then we can fix this by naming thing more appropriately:
thing? := get-the-thing
thing? else: [ no-thing ]
thing? then: [ thing | do-the-thing: thing ]
Problem solved and a convention established. Maybe is best named as "we're not entirely sure this is the thing we expected" with the question mark. This isn't enforced by anything other than convention. The shadowing is now removed and considered an error.
Back to closures. We need a way to specify if we're copying the variable or referencing the variable still.
If we revisit the idea of syntax for references we can re-use that syntax in our capture list. If we borrow from other languages then & is the be-all-and-end-all of referencing syntax.
[ main |
a = 123
[ add_one, &a | a += 1 ]
add_one
add_one
add_one
]
It's concise and reasonable understandable by most developers. This is the shoot yourself in the foot moment. Unless we come back and add borrow checking later the developer must understand the implications of capturing references to local variables. Especially if we have tail-recursion.
Let's revisit our canonical example of making references and using references:
import: 'graphics/widgets'
import: 'graphics/opengl'
export:
[ main |
main-window = { &widgets.window | new init, title: 'My Window' }
glc = { &opengl.context | new init }
glc bind: main-window
]
Not much changed. The 'ref' calls are gone and tbh they were hard to notice. Ironically the & is clearer in every way.
There are some big advantages to copy instead of reference. Some types, such as string, have a pointer to 'stuff in memory' (which might come from literals at compile time or a computed string). Copying the object is only copying a reference which is a memory address and memory allocator reference.
It also means when the stack frame goes out of scope unless you've deleted the string it will work properly in the closure. Let's look at callbacks that could run at any time from any thread:
[ main |
db = sqlite3 open: 'foo.db' mode: sqlite3 RD_ONLY
db select: 'SELECT count(*) FROM table1' callback: [ row |
count1 = row at: 0
db select: 'SELECT count(*) FROM table2' callback: [ row, count1 |
count2 = row at: 0
stdout print: count1 + count2
]
]
]
If we're dealing with hundreds of thousands of concurrent things, or even millions, we cannot use system threads. We'll run out too fast. This is especially true when doing networking. Databases aren't that dissimilar.
Either we use callbacks, green threads, or co-routines. With green threads and co-routines we can use a proxy thing like reference that runs the code in another green thread and blocks this green thread until its done. Co-routines are the same there.
Here's the nub though - this is systems level programming. We don't get to create co-routines and green threads and not admit we've now a high level language. We're pretending we're not a high level language.
We could define the language as continuation based - there is no stack except which is still referenced later in the method. That makes yield/resume very easy to implement for co-routines. But again we're subverting the expectations of a simple systems language.
Because the callback on the inside is a closure there's no way to 'flatten' this. Out of curiosity, how does this code look if we use a promise and co-routines:
[ main |
db = sqlite3 open: 'foo.db' mode: sqlite3 RD_ONLY
row1 = db select: 'SELECT count(*) FROM table1' as: { count: int }
row2 = db select: 'SELECT count(*) FROM table2' as: { count: int }
stdout print: (row1 count) + (row2 count)
]
The crux here is sending count to row1 and row2 will have to wait on the select to complete. This right here is where the green threads matter. If we added co-routines then the count wrapper method would do a yield to free up this thread and queue it up to wake up later.
This is desirable. So desirable we're going to have to do it. The trick will be to embrace continuations. In another blog post we'll look at how that can be implemented while keeping the language as a systems language (and getting tail-call optimisations for free! – and therefore loops!)