stz continuations
We have to start this story with tail call optimisation. The story of the stack is an interesting one. It is convention to keep the entire history of your call pathway on the stack in most languages. This allows you to debug what went wrong which can be super useful. It makes sense but it comes with a cost - a bunch of idle memory.
That is pretty harmless unless you use the stack heavily and have the ability to put dynamically sized things on the stack. I'm a fan of that - you can do that in C but not C++.
But if you do something recursively, calling yourself over and over again, you can 'blow up' the stack by exhausting the stack space memory. It can happen very quickly.
Enter the TCO, tail call optimisation. If you're calling yourself at the end of the method you can re-use the current stack frame. The stack doesn't grow. Very neat.
Let's imagine the most core idiom of is counting from N to M. Let's call this either an interval or a range. The Smalltalk name is Interval. The trendy modern name is Range. So let's go with Range. An Interval, after all, is a range plus a step.
range := { start: $T, stop: $T }
[ range do: block | range-of: $T, ($T) -> self | start to: stop do: block ]
[ start to: stop do: block | $T, $T, ($T) |
block evaluate: start
start == stop else: [ start + 1 to: stop do: block ]
]
At the end of the method we call the method again. This is a prime candidate for TCO. It allows the language to have loops - infinite or limited - without the core language itself knowing anything about what a loop is. Is that a useful feature? it's a very Smalltalk idea. I guess we'll let time say if this is a good idea or not. We're no where near designing libraries yet.
So TCO is desired. There is another fancy idea though. It's called Continuation Passing Style, or CPS for short. The idea is that every statement in a method takes a final argument that tells the method where to go to next. Every statement is the end of the method and every statement is the start of the method. Every piece of local state that needs to 'continue' must be passed along to the next method.
Sometimes this is done with closures, sometimes this is put on the heaps and shared between frames. Sometimes they are literally passed around as hidden parameters in the calls. All of these things introduce inefficiencies and surprise logic. In short, it forces the language to be higher level. Done the most common way it even requires garbage collection.
The huge advantage to CPS is that every single statement is its own TCO. The stack never grows. It just replaces itself over and over again. Neat, but doing all those jumps is bad for the CPU. Letting the processor counter register increment naturally is SO much faster and better for the prediction engine.
What we want is to know if we need to cut a method in two and make it act like a continuation. In the previous blog post we decided the best place to do it would be when you'd wait on a lock. Instead of waiting on the lock you save the next 'next' and arguments and hand it off to a manager of some kind.
An 'event loop' thread would be idea for managing these things. Again we're getting a little high level here and that is always a dangerous smell. But let's explore it for a while longer to see just how deep the rabbit hole might go.
The dumbest way to do continuations is to copy the entire stack. Combined with CPS that means you're copying next to nothing. Combined with TCO there's a lot of situations where the stack ends up being minimal. No TCO, no continuations, the stack can be quite large and expensive to copy.
Let's look at exactly where it'd be used:
[ main |
db = sqlite3 open: 'foo.db' mode: sqlite3 RD_ONLY
row1 = db select: 'SELECT count(*) FROM table1' as: { count: int }
row2 = db select: 'SELECT count(*) FROM table2' as: { count: int }
stdout print: (row1 count) + (row2 count)
]
Here's out example. The type returned by select: is something that acts as a proxy. Sending #count to it forces it to become 'real' if it wasn't already. The act of actualising the value means running some code, in this case actually doing a database call. This is called a promise. It promises to return the result at some time in the future.
The promise can achieve this by punting the calculations off to a worker thread and wait for the answer. We can't simply wait on a lock on the thread asking for the selects because as we discussed in the last post there's a limited number of actual OS threads.
We're faced with the notion that our code might resume 'somewhere' but not necessarily the thread we started on. Another way to do this is to 'swap out' the current stack with a new one with the intention of restoring it when the other one is done.
program thread | | program thread
| database logic |
Here we're swapping between two different 'green threads'. The | represents a yield in one green thread which activates another green thread. If we were a high level language we'd run the entire program this way. Every time you yield a green thread would begin until it then next yields. Eventually you get the result you asked for back and eventually the green thread that needed it gets to run again.
This is not ideal. We want to restore the program thread as soon as possible. Let's look at a more complicated example. A web server. Reading from the incoming socket will take time which means we might have many accepted connections before even one is fully processed.
= listen-for-events
accept-connection-1
read-header-for-1 | yield-1
= listen-for-events
accept-connection-2
read-header-for-2 | yield-2
= listen-for-events
received-header-for-1 | resume-1
read-body-for-1 | yield-1
= listen-for-events
received-header-for-2 | resume-2
read-body-for-2 | yield-2
= listen-for-events
accept-connection-3
read-header-for-3 | yield-3
= listen-for-events
receive-body-for-1 | resume-1
send-response-1 | yield-1
= listen-for-events
receive-header-for-3 | resume-3
read-body-for-3 | yield-3
= listen-for-events
received-body-for-2 | resume-2
send-response-2 | yield-2
= listen-for-events
sent-response-1 | resume-1
close-1
= listen-for-events
received-body-for-3 | resume-3
send-response-3 | yield-3
= listen-for-events
sent-response-2 | resume-2
close-2
= listen-for-events
sent-response-3 | resume-3
close-3
= listen-for-events
The core here is listening for events. Those events tend to come from something like epoll, or kqueue. Assuming yield/resume is cheap this allows one of our very fancy 4ghz cpu cores to process millions of connections a second. Add in worker threads and allow the green threads to resume on other threads and suddenly you have a super server.
What we see here, whether it be file io, database io, network io, io of any kind really, is there is a necessity for a loop to act on events. The question then becomes this: is this a core part of the language or does it work as a library. If it's a library then we'd need to expose yield/resume as an API.
We'd yield to a manager of some kind and the manager would then either choose to resume one of its yielded green threads or wait for active io handles. How high level is too high to be low level though? There's very little advantage to having file io or network io that's blocking by default. That's not the way the world works.
I'm torn. Do we start up all stz programs as a single green thread just waiting for a reason to yield? As we've seen above we don't need OS threads to make green threads work. Do we ever want more than one manager of green threads? the API for opening files and reading/writing them needs to know about the manager, as does the network API.
It makes sense that a multi-threaded green thread manager is a developer choice rather than a programming default. You might want to use threads differently and it feels wrong to require all that machinery in the guts of the language.
The manager, if there's only one, needs to be thread safe too. You only want one and you want it shared across all threads. OR you pass it in to every IO call. Passing it in to every IO call is the more low level approach. It's the approach we should try to do until it's too difficult. So what does it look like?
Also what do we call these things? 'green threads' and 'OS threads' living side by side is a recipe for confusion. In Smalltalk they're called a Process. Until we have a better name for it, let's go with that.
io := { &io-manager | loop }
// we are now running in the io loop on this thread and the next code runs in the first new process
handle := { file-handle | io open: 'blah.txt' mode: RD_ONLY }
buffer := { fixed-array: u8 size: 1024 }
io read: handle into: buffer
stdout print: buffer
io close: handle
When we call io read: handle into: buffer we end up yielding the process in to io. Since there are no other processes to run it'll wait on an epoll or kqueue event to wake up the thread again. Otherwise we are in the endless loop until there are no open handles and no processes - then the loop method will end and the thread will presumably end too.
Is this terrible? Actually no. It's still nice and low level. We're requiring the language to support yield/resume and any actual IO you want to do is going to require you to fire up an io-manager. It does make me think that a special _startup method should do that first line. This would have to be overridable in case you wanted a fancier io-manager, or none at all. I, otherwise, see no reason to provide blocking IO in any form.
The question remaining is how to implement yield/resume. Yes, we copy the stack - but how much stack are we going to have to copy and how big is that? We don't have to copy past the call where the last resume was (or the first resume when io-manager starts the first process). It's not going to be a lot unless the developer makes it a lot. This is something to return to again in the future.
At the start of this blog we asked the question - how many high level features can live in the low level world. This is one of them.