when to use a reference

when to use a reference
Photo by sol / Unsplash

There's two ways to pass objects as parameters in a method. Either as a copy or as a reference. If we're simply using the data, we pass it as a copy. If we intend to modify it we pass it as a reference.

Copying large objects around isn't efficient. Under the hood the compiler will make a decision based on the size of the object and the register pressure at that part of the problem. If it can break the object up in to different registers it will do; if not it will pass it as an immutable reference.

jane: {Person | name: 'Jane'}.
self do-the-thing: jane. // a copy of jane
self do-the-other-thing: &jane. // a reference to jane

If Person is a big structure and there is too much register pressure (or the size of the data won't fit) then an actual copy must be made. That is the slowest way to pass a bunch of data to another method. We want it to pass by reference at this point.

There might be an argument for adding a syntax such as ^ to specifically state we want data to be passed as an immutable reference. For now I don't see a huge benefit to this. The calling convention of the platform you're compiling to has a big role to play in how parameters are passed when calling a method.

The next part of this story is how we treat references. If we're calling a method that expects a Person but we give it a reference to a Person the compiler will treat the reference as transparent for you:

[self do-the-thing: person] [self, Person -> ø |
  person name: person name + 'ii'.
  stdout print: person name].

jane: {Person | name: 'Jane'}.

// pass a copy
self do-the-thing: jane.

// pass our jane, the name will be updated
self do-the-thing: &jane.

We may specifically want the reference object. In that case we can specify that in the signature of the method. We will not be able to send Person methods to it until we've resolved it though:

[self do-the-thing: personref] [self, &Person -> ø |
  person? := personref dereference. // is a Person / Undefined
  person?
    then: [person | stdout print: person name]
    else: [stdout print: 'undefined person!']]

Messing around with references is a rare thing to do in stz. It can be useful when dealing with external data from a C library. As we can see they are a little awkward to use with several layers of protection. This is necessary to keep the language safe and sane.

Editing objects 'in place' does not require the use of references as parameter types. It's how we pass the object that matters more than what the method expects:

people := (Person | {name: Jane}).

// take a copy of the person from the array and modify it
self rename-person: people[0].

// modify the person in the array 'in place'
self rename-person: &people[0].

The compiler knows if the parameter is going to be modified by rename-person:. If the method does modify the parameter it will not pass a reference to the original data as an optimisation. It will instead make a copy of it.

If the method doesn't modify the parameter the compiler can safely pass a reference to it no matter whether you specify & in front of it or not.

This brings us to an important topic - threaded memory safety. If another thread is processing the array and we are trying to use that same array the data can be modified out underneath us.

There are two concepts here - a thread or a process. A process is a cooperative in-thread execution. It yields to the next process whenever it comes up against some kind of IO block or when it is specifically told to yield.

A thread, on the other hand, is an operating system thread. It is not guaranteed to be co-operative and may run simultaneously with other threads.

To manage the memory between threads, memory must be explicitly shared between them. That means people in the example above will not be accessible from another thread unless you share it with a thread your thread has created.

Note that people is on the stack. Passing it to another thread is not possible with STZ - we have to allocate it with a memory allocator first:

// create a forever growing memory arena
shared-memory: MemoryArena.

// create a people array on the heap in the memory arena
people := {&Array of: Person | allocate: &shared-memory} -- [free].

// put a person in to the array
people add-all: ({name: 'Jane'},).

// make a child-thread to print out the people
child-thread := {Thread |
  entry: [self: (&Array of: Person) |
    self | stdout]
  parameters: people
  arenas: &shared-memory}.

// start the child-thread
child-thread start.

// wait for the child-thread to finish so we don't deallocate people underneath it
child-thread wait.

There's a lot of boilerplate involved in sharing data across threads. This in part reminds us of the foot-guns we're holding when we do it. This comes in to play more when we use a worker thread pool so that jobs are distributed in processes across as many threads as we allocate. We need to consider how to keep our data isolated; or explicitly share it.