stz revisiting specialisation

stz revisiting specialisation
Photo by JoJo Hikes / Unsplash

While driving eight hours I had some time for focused thinking. Since I'm deep in writing a parser for the language I was contemplating some of the stickier corners of the syntax.

Last time I talked about specialisation I made an error. I came up with this syntax:

my-class ( @foo: object-t, ... )

At the time I thought it was great because I had this variable @of: everywhere and I hated it. But suddenly I have @element-class which is what I really wanted. But I forgot to add a way to defined how to make the specialised class which is the point of the of:. So I needed something new.

As a place holder I had been writing this:

my-class = ( @of: (element-class: object-t,), ...)

I never likcd this. It's terrible. It uses a list with only one element and it looks awkward. It gets worse when you have multiple parameters for the class.

And the final problem is - what if you want more than one way to specialise the class? what if you want the 'abstract' class to fully work without specialisation? none of these things are possible.

It then dawned on me I have a syntax for this already. Something that specifies a method you can call, the names of its parameters, and the types of the parameters. I had previously called it a method-signature and it's used to specify required implementation details for a trait.

So let's try and use that to fix up list:

list = (
  #specialise: [ of: element-class | object-t -> class ],
  #traits: (iterable-t of: element-class, sequential-t of: element-class,)

  #scope: public,  length: uint
  #scope: package, elements: (array of: element-class)
  #scope: package, allocator: &memory-allocator )

(I also changed it so when you new-line inside a list it's counted as a new element so the , spam isn't necessary unless it's all on one line)

This is a lot better. I am trying a receiverless method name signature there because the type will be list class and that seems redundant to specify.

This solves the first and second problems. What about the third where a class should be usable without specialising? First let's rewrite array with this syntax and see how it looks:

fixed-array = (
  #specialise: [ of: element-class length: len | object-t, uint -> class ]
  #traits: (iterable-t of: element-class, sequential-t of: element-class,)

  #scope: package, origin: (memory-address of: element-class) )

array = (
  #specialise: [ of: element-class | object-t ]
  #traits: (iterable-t of: element-class, sequential-t of: element-class,)

  #scope: public,  length: uint
  #scope: package, origin: (memory-address of: element-class) )

Right now I have to define two classes for array. One which has compile time length and the other which has runtime length. This is clearly a design smell. We want just one kind of array that can be specialised.

Let's change the rules and say you can shadow a variable of a class with a specialisation to erase it from the structure. A subtraction of a field. Let's try it.

array = (
  #specialise: [ of: element-class | object-t -> class ]
  #specialise: [ of: element-class length: length | object-t, uint -> class ]
  #traits: (iterable-of: element-class, sequential-t of: element-class,)

  #scope: public,  length: uint
  #scope: private, origin: (memory-address of: element-class) )

In this version we have two ways of making an array - one with a length and one without a length. By re-using the name length as a field of the array structure we're shadowing that member.

// a variable length of person with two members length and origin
people = array of: person

// an array of length 10 of person with only one member origin
people = array of: person length: 10

When a class only has one member it becomes a distinct variant of the type inside of it. Instead of a pointer to an array with a pointer to the elements the compiler will be smart enough to have just one pointer to the elements and the class information, including the length, exist at compile time only.

This also allows the compiler to put the whole array on the stack. And if possible the whole array in to a machine register. Eg: array of: f64 length: 8 can fit in a 512-bit xmm register if such a register exists for the compilation target.

This might be the wrong approach for SIMD instructions - instead of the compiler being very smart, it might be better to be explicit about its use. This is a systems language and being too smart for the developer and doing hidden things is rarely desirable; as opposed to the parent implementations of Smalltalk which aim to make life as opaque as possible for the developer.

Back to our example. Can we also remove element-class and allow array to be used without any specialisation? ideally also allowing the same for list and things like range as well?

array = (
  #specialise: [ of: element-class | object-t -> class ]
  #specialise: [ of: element-class length: length | object-t, uint -> class ]
  #traits: (iterable-of: element-class, sequential-t of: element-class,)

  #scope: public,  length: uint
  #scope: package, element-class: object-t = any
  #scope: private, origin: (memory-address of: element-class) )

We can toss element-class in there as another member that will be automatically removed at compile time if it's specified. But the problem is ... it looks like any member. It might have a default value of any but is it also not modifiable? how is the compiler to know you shouldn't be able to change it? likewise with length.

You can only remove a member if it is only used as compile time. The simple answer is to say it's used as a parameter in a specialisation and therefore it is compile time only. I'm willing to accept that as an answer. If you call a specialisation that specifies the type then it becomes a compile time member.

This doesn't help though because in the case of no specialisation we're not specifying any parameters, so we're not naming element-class. We need some other way to mark a member as compile time.

Strangely the answer might be to put a whole method in the specialise field. That way we can name all parameters that, in that specialisation, are compile time.

array = (
  #specialise: [ | -> class | element-class = any ]
  #specialise: [ of: element-class | object-t -> class ]
  #specialise: [ of: element-class length: length | object-t, uint -> class ]
  #traits: (iterable-of: element-class, sequential-t of: element-class,)

  #scope: public,  length: uint
  #scope: private, origin: (memory-address of: element-class) )

If we allow specialise to take a method-signature or a method then we can name the other variables in the method body. Now we have made it clear what happens when you don't specialise the array. You have a member field of length and element-class becomes any.

I'm happy with this solution. It's explicit and documents the nature of the class well. We don't need a fixed-array any more either. The only thing weird here is we have to redefine the type of length twice. Once as a member field and once as a parameter for the specialisation. In reality this is a blessing. It allows us to override the type of the member field as part of the specialisation.

Now the only difference between array and list is list will dynamically re-allocate itself to grow. The majority of behaviour between array and list are otherwise the same.