stz from stack to heap

stz from stack to heap
Photo by Markus Spiske / Unsplash

One of the hallmarks of Smalltalk code is sending 'new' to a class. This creates you a new object that you can then configure with message sends, eg:

Person new
  name: 'Jane';
  height: 5 foot + 6 inches;
  yourself

If we were to use this approach for making objects in stz we wouldn't be able to specify if we wanted it on the stack or the heap. We need to know the type to put it on the stack so dynamic message sends at runtime are out.

We have already solved this for stack based object creation:
jane = { Person | } will create space on the stack for a Person that we've assigned to a variable called jane.

The code before the | is run at compile time which allows it to be pseudo-dynamic. Not dynamic at runtime. If we wanted to create an unknown class at runtime we would have to allocate it on to the heap.

We can already put something on the heap by specifying we want a reference to a type and sending new to it, eg:
jane = { &Person | new }

This is equivalent to writing it the Smalltalk way:
jane = Person new

The problem is we have four ways of doing the same thing. That's a design smell to me:

jane = { &Person | new, name: 'Jane', height: 5 foot + 6 inches }

jane = { &Person | new } name: 'Jane', height: 5 foot + 6 inches

jane = { &Person } new
  name: 'Jane',
  height: 5 foot + 6 inches

jane = Person new
  name: 'Jane',
  height: 5 foot + 6 inches

Given sometimes the type is given to us by parameters to another method we're calling or by the return type it seems like the {} syntax is problematic.

Way back when we started looking at this we examined some Go, Jai, Odin code and they use the following pattern to declare types:

jane: Person
jane: Person = make_person()
jane := make_person()

This is nice syntax but it has a problem for stz. We can't distinguish between a receiverless send of jane: and a variable declaration of type Person jane: Person.

One solution is to say there is an 'ideomatic way' of writing stz. Putting all the parameters inside the right side of the {|} is the stylistic approach I've been taking so far. It looks similar to making a map: `( string, string | 'foo': 'bar' ) and that's why I've been doing it that way.

Another thought here is mutability. We can say that jane is mutable inside the {} but immutable outside of it. We could add a syntax for making a copy of a variable for modification with the same name it has now, eg:

[ hug: person
| Person
| { ...person | state: hugged }
  

This gets complicated when we start dealing with references though. We want to take a copy of the contents of the reference and make a new reference to that copy. Ugh!

Mutability/Immutability is hard which is why borrow checkers now exist. It seems for the time being we stick with everything-is-mutable and more-than-one-way-ish.

No doubt we will come back to this later and I shall curse past-me for short-sightedness.

Another piece to the puzzle here is memory allocators. Unlike in Smalltalk where new can use the one-true-memory-allocation primitive we have not baked in memory management support in to the language. The heap is entirely driven by libraries because there is no garbage collection.

The memory package can have a default allocator but one of the advantages of manual memory management is creating temporary memory pools that you can throw away cheaply when you're done with them. That encourages us to allocate using a memory allocator object.

jane = my-allocator new: Person // jane's type is now &Person

When we put things on the heap we will find we rarely, if ever, need the {|} syntax. You are, however, losing the benefits of having the stack. You also need to remember to deallocate jane when you're done with them.

A common pattern is to pass in a location to put the details of a type which allows you to allocate it on the stack or the heap and use the same API:

[ setup-jane: person
| &Person
| person name: 'Jane', height: 5 foot + 6 inches ]

stack-jane = { Person }
heap-jane = my-allocator new: Person

setup-jane: &stack-jane
setup-jane: heap-jane

When we zoom in on the setup-jane: method we see we're taking a reference to Person as the type. The problem with this declaration is we're not specifying whether we intend to change the contents of the reference (or even the value of the reference itself) or if we intend to simply read this data.

There was an attempt to add in out and inout to C++ at one point. Or was it C? I can't remember. I know some people add them as 'do nothing' macros just to be very explicit about the intent of the parameter.

We could state that all parameters are read-only by default and you have to mark it if you intend to change it. We could add a prefix syntax to indicate we're changing something (or a suffix syntax). It gets a little messy when dealing with references though. Let's use ! to indicate we're going to change it:

&!Person means a reference to a Person we're changing, while !&Person means a reference we're changing to a Person we're not changing.

This syntax sucks but it does convey what we need to know. We could quietly introduce some compilation scope context methods in: out: and inout: which would result in a method declaration like:

[ hug: person
| &out: Person
| person state = hugged ]

The other verison where we're modifying the reference not the Person would look like: out: &Person. It might be a little confusing. In also does convey what we need to convey and it does allow us to mark a variable as immutable or mutable. It does let a developer know what might happen when they call the method too.

But does 'out' mean what we want it to mean? inputs and outputs don't convey what we're trying to say - mutability and immutability. We could grab 'mut' from languages like Rust:

[ hug: person | &mut:Person | person state: hugged ]

Note that we have no way to indicate whether variables we're accessing from a closure are mutable or not. There are holes here.

Another approach is to make every change to an object happen on your own copy of the object. They are passed around copy-on-write. This could be surprising though:

[ hug: person | &Person | person state: hugged ]
jane = { Person | state: unhugged }
hug: jane
assert: jane state == hugged // will fail

This would require an ideom of always returning the thing you were given just in case you changed it. jane = hug: jane is.. a thing. Not a thing I like.

Instead of specifying mutable/immutable or in/out let's go with read/write as our nomenclature:

[ a-person hug: b-person
| &ro:Person, &rw:Person
| b-person state: hugged by: a-person ]

This works. 'write-only' is a little head scratching but can make sense when you're initialising an object. Does it matter to state "I promise not to look inside" – probably not. We can therefore narrow it down to 'read-only' as r: and 'read-write' as w:.

We can also consider a reasonable default for a modern language to be read-only. We therefore only need to specify when we're modifying something. Yes I know I just went full circle. I'd like to explore a different syntax – given that {|} indicates creation may be we could reuse that syntax in the type definition:

[ hugger hug: huggee
| &Person, &{Person}
| huggee state: hugged by: hugger ]

Nah. That's a bit too busy with strange symbols everywhere -and- new syntax. After all the types list is an executable code list and not something special.

Speaking of not being special. &Person and &jane currently work differently. If I wanted to get a reference to the type Person with &Person instead it would currently return me a new type that is Reference of: Person. Woops. That is a bit of a brain bender – at compile time does it make sense to take the reference of a type? Whether or not it makes sense the syntax is inconsistent with runtime.

In some languages they use ! suffix to indicate a method that will change a parameter. We're better off indicating our usage of the parameter than labelling the method. I have no issue with implicit method overloading so long as the compiler produces helpful messages about it.

[ list add: element
| &list!, list element-class -> list-error
| length ≤ elements length else: [
    grow-status = grow
    grow-status == could-not-grow then: [ ^could-not-grow ] ]
  elements[length] = element
  length += 1
  ^no-error ]

Eep! we're changing the list! – this might be okay. By putting it on the suffix do we fix everything? What if we intended to modifying the reference, not the value in the reference. (&list)! this is the price we pay for mixing prefix with suffix. It does bring us back to &!list and !&list.

May be we have to accept this - unless we wanted to go full english and write the whole thing out: Reference of: (Mutable of: Person) and Mutable of: (Reference of: Person). I don't think we want to do that. Let's look at getter/setter for Person name:

[ person get-name | &Person -> string | ... ]
[ person set-name: name | &!Person | ... ]

I had wanted to use ! prefix as a language feature for boolean not. Again that'd be a runtime versus compile time difference. That is achievable for both cases:

[ ! b | bool -> bool | 1 - (u8 cast: b) ]
[ & object | object -> Reference of: object | ... ]
[ ! type | class -> class | Mutable of: type ]
[ & type | class -> class | Reference of: type ]

Is it acceptable though to overload concepts like this? If not then we'd need a new symbols for ! & on types.

Let's cycle back to variable captures of blocks. Currently we have no way of specifying what we intend to capture except by adding a ~ to the front of the variable name. Cute, but we're not specifying if we want a mutable or immutable version of it:

foo = { Person | age: 20 }
ten-years-older = [ -> int | ~foo age + 10 ]

What if we try to modify foo inside the closure? I think we need to find a way to specify the variable captures again. Though this time we're not trying to satisfy an LR parser, we could go back to the syntax we had before. Or we could come up with a different syntax.

foo = { Person | age: 20 years }
increase-age = [ amount: seconds, ~!foo -> ø | foo age: foo age + amount ]
increase-age evaluate: 1 years

In this approach we include the variables we're capturing as 'parameters' to the closure. This is technically correct (the best kind of correct!) because they are passed along with the closure code pointer as an implicit parameter. To distinguish them from the other parameters we re-use the cute tail ~.

This makes things nice and explicit again and once again adds information at compile time that is unnecessary in the runtime portion of the block.

[ ~ type | class -> class | Clossure of: class ]

This will have to do for now. It's an 'okay' solution.