stz

zero cost reinterpretation

Sometimes you have a piece of data and you want to treat it differently to how it currently is. Let's take a concrete example: a filename. Not a file, but its name. It could include a path, it might not.

Underneath it all, it's a bunch of bytes we got from the operating system or some bytes we created ourselves at compile time or runtime. Over that we've reinterpreted it with some kind of encoding to treat it as a string.

Now on top of that we want to interpret it as a filename. What we don't want to do is pay any cost for these reinterpretations. STZ does not have type aliasing or distinct types, but it does have structures/objects and if there's only one member of that structure it is, effectively, a reinterpretation of its inner object.

STZ will not create a structure with the data inside of it, it will simply keep the data inside of it as itself and tag the variable its in as the new type. Now messages sent to it will be filename messages instead of string messages instead of byte array messages.

Latin1String (
  #traits: (String of: UnsignedInteger8,)
  characters: (Array of: UnsignedInteger8))

Filename (string: String)

hello-txt := (UnsignedInteger8)
             (0x68 0x65 0x6c 0x6c 0x6f 0x2e 0x74 0x78 0x74);
hello-string: Latin1String = {from-bytes: hello-txt};
hello-filename: Filename = {from-string: hello-string};

// get first byte
hello-txt[0]; // 0x68

// get first character
hello-string[0]; // 'h'

// check if first character is uppercase
hello-string[0] uppercase?; // false

// get first filename part
hello-filename[0]; // 'hello.txt'

// get filename parts
hello-filename parts ...free; // ('hello.txt',)

Same data, different interpretations of that data, different APIs. You cannot pass a Filename to a method expecting a String, and you cannot pass a String to a method expecting a Filename. This preserves our type safety but gives us the flexibility to wrap anything any way we want and give it new life.

This is a pseudo implementation btw - Array has a pointer to its data and so does String, which means so does Filename. That allows passing Filename and String to methods cheap. It also allows an abstraction of "the data came from..." operating system, compiler, or runtime.

One little thing I slipped in to this example was ...free which is a variation on the theme of deferred messages. The idea is that if you put the ... after a variable or message send you are creating a deferred action that will run when the context ends; but the default-object of that context is the thing to the left of your ...

This allows us to send free to whatever rando object we get back. What remains though is knowing when you need to do that.

ObjectiveC gets around this problem by having a naming convention. If you send alloc then you must also send free. I'd like to think we can do a little better. Since we have Array and String is just pointers in to an array what we're returning from parts is an array of arrays.

That might as well be by-value instead of something allocated on the heap. Allocating on the heap should, when at all possible, by an action the developer takes rather than the library you're using. If you're using an API that needs to do it ideally there'd be a parameter for the allocator to use.

Since there's no parameter for the allocator to parts we should be able to safely assume that no memory allocation has taken place. We therefore do not need to send free to the thing we got back. So while the ...free is really neat and useful, it's not needed here.

As per the AMD calling convention, if you're returning a complex object that won't fit in registers the caller will allocate stack space for the result. That means we now own parts until our context is complete and the stack pops away those values.

If we wanted to return the result of parts to our caller we'd have to make sure the caller owned the filename we're operating on. We'd want that passed in. This is where a simpler memory checker can really shine.

A buggy program:

[self get-parts]
[Module -> Array of: String]
[({Filename} {string: 'hello.txt'}) parts]

A non-buggy program:

[self get-parts: filename]
[Module, Filename -> Array of: String]
[filename parts]

A non-buggy heap program:

[self get-parts: allocator]
[Module, MemoryAllocator -> Array of: String]
[bytes := allocator allocate: UnsignedInteger8 size: filename string size;
 'hello.txt' bytes | bytes;
 filename := {Filename} {string: {String} {characters: bytes}}
 filename parts]

This third example is a tricky one. I spelled out each step for clarity. The only thing that tells the developer there's something they will have to free is the fact that we passed in an allocator.

The problem for any kind of memory checker is that example 1 "buggy" and example 3 "non-buggy heap" both look the same. They sent parts to a stack-allocated Filename object and return that array to the sender.

There's no simple way to determine whether example 3 is different. After all we don't know the nature of the allocator. It could be a stack allocator. The memory safety is in the hands of the developer.

This would be where a borrow checker would step in and save the day. It would be aware that in example #3 the data is on the heap and it would even know the scope of the allocator - so it'd know the true life span of the data and when it would no longer be safe to use the parts returned.

This is a very useful tool for developers and ideally shouldn't put too much cognitive load on them. Rust is celebrated vehemently because of its borrow checker. The syntax is a little less than desired though.

If we were to add borrow checking to STZ we might need a way to indicate the intention of an argument. But for now we're leaving that off the table. I personally don't have a problem with memory safety issues knowing that they can be better handled without breaking the language down the road.

zero cost reinterpretation

Read more

inlining closures

simple classes

auto-cleanup

count vowels