coroutine implementation

coroutine implementation
Photo by Obi - @pixel9propics / Unsplash

It's been a very long while since I've coded coroutines in C. I had naively thought that I could call getcontext and setcontext and be done with it. Well, it turns out that was wrong.

In 2016ish? POSIX decided to deprecate the makecontext/swapcontext/getcontext/setcontext API. Poof. Gone. All the operating systems that I use still actually implement it but they will throw crazy warnings at you first.

As with all things programming it was time to do some research and design.

There's two kinds of coroutine - synchronous and asynchronous. A synchronous coroutine is usually thought of as a 'continuation'. You effectively goto from one context to another and you shall never return. Never ever return.

Asynchronous on the other hand is a 'the caller blocks' approach where you are effectively doing a gosub and know there'll be a return. Always return.

The difference between goto/gosub and sync/async is the execution stack. You aren't just jumping to a new memory location, you're restoring how the stack and registers were when you left it last.

This, coincidentally, is sort of how closures work too. A closure will pre-emptively capture the state of what it will execute if you ever call it, which you don't necessarily have to. Closures, however, only copy/reference the things your code actually uses. They also can't "pause" like a coroutine. It does beg the question - should STZ do coroutines instead of closures? and if so what would that look like?

Sync and Async approaches to coroutines put different requirements on the way you use them. Because async requires you to return you need to consider the hierarchy of execution - which coroutine calls which other coroutine and when do they yield.

Sync on the other hand never returns so you're always jumping from one coroutine to another. You don't have to worry about returning because.. well, you never return. Ever.

This leads to two different designs in how you utilise them. With Async you tend to want a top level 'run loop' which will use kqueue/epoll/etc for you and iteratively call to each active coroutine one after the other until it's time to wait for an event again. This is an invasive design in that the program must start a certain way - entering the run loop.

Sync doesn't need any kind of top level runloop. Any time you think you might need to block on kqueue/epoll/etc you do so and when you come back from that call you jump to the next now active coroutine.

When we use coroutines for network code we have two scenarios to consider - server and client. With a server you start listening on port(s) and when a connection comes in you make a new coroutine to process it. You already have a runloop of sorts.

With a client, though, you don't have a runloop until you decide to process several things at once - such as downloading multiple resources from the internet. You wait on all those coroutines to finish before you yourself continue. The coordinator that kicks off all those coroutines is the runloop of sorts.

The research part on what we can use given the deprecation is where things get interesting (and very fun!). The original POSIX implementation is referred to as ucontext. It has a bad reputation as some implementations were extremely slow. Slower than threads but with the benefit of not having thread limits.

I found four that are worth talking about: libtask, coroutine, Tina, and minicoro.

libtask seems to be where ucontext left off. It's a rewrite of the concept with a similar API and naming but written using assembler. It doesn't suffer the speed issues of vendor implementations. It seems to support, primarily, sync coroutines.

coroutine seems to be the next step along from libtask. It seems to support more platforms than libtask.

Those two seem quite dated now and I wondered if there was anything newer out there. This lead me to Tina. Tina supports both sync and asynchronous coroutines and looks to support most platforms I'd care about with fallback if it cannot directly support in the compilation environment.

It also referenced minicoro which only supports async coroutines, but it does support WebAssembly which would be nice to have as an option in STZ.

Given our design requires a runloop whether we're server or client it seems we don't really need sync coroutines. Async will do just fine. That leaves the choice between Tina and minicoro. I think, initially, I will play with minicoro and see how it feels.

minicoro also has a very neat feature of virtual memory for the stack. I'm not sure I'll need gigantic stack support in STZ but may be it's an ace in the pocket for later if necessary.