stz iterating

stz iterating
Photo by Sigmund / Unsplash

One thing that I like to explore is the nature of iteration. In many C like languages you have for. for is a very powerful tool but it is not extensible. Some languages have bent over backwards to put as much power in to for as possible so that you don't need to extend it. They get 80% of the way there and the 20% you work around. for is great. Smalltalk-80 decided not to go with for.

I'm going to assume people reading here know how for works.

With block closures Smalltalk-80 decided to make message oriented iteration. The primary method used for iteration is do:.

people do: [:person | person jump]

When you want to filter that collection you can use select: and reject: which work in a similar way but your block must return a boolean.

people select: [:person | person age >= 18]

To map each element in the collection in to some other computed value you have collect:.

addresses := people select: [:person | person address]

You can even do a reduce operation using this idiom by having an initial value and iterating with an accumulator using inject:into:.

combinedAge := people
    inject: 0
    into: [:accum :person | accum + person age]

Any object can respond to these protocols. That means arrays, lists, virtual collections over a network. It's very extensible. I remember reading once there's a constant battle in language design between 'concepts' and 'keywords'.

Do you design your language around having lots of concepts, or lots of keywords? If you have lots of both then your cognitive load is huge. So you have to favour one over the other. Smalltalk-80 wants lots of concepts and few keywords. Go wants lots of keywords and few concepts.

That's the good. Now let's talk about the bad.

Problem 1 - select:, reject:, collect:, etc all create a new collection. This is both a strength and a weakness. It's very functional oriented and it allows the kind of container to adapt as needed. But it's also a constant limit when dealing with large numbers of objects. You need to pivot to virtual collections.

Problem 2 - They don't compose well. Sure, they're objects, but look at this mess:

adultAddresses := (people select: [:person | person age >= 18]) 
    collect: [:person | person address]

The brackets-to-the-left problem in Smalltalk is pervasive. So much so most Smalltalk programmers don't even think about it. Python comprehensions exist to avoid this kind of difficulty in describing an iteration.

Smalltalk programmers get around this in one of two ways. Either they assign each step to a variable, or they format the method in a particular kind of way.

adults := people select: [:person | person age >= 18]
addresses := adults collect: [:person | person address]

It's descriptive, I'll give it that. But may be we can do better. One approach implemented as a toy in Smalltalk was Higher Order Messenging. It allows you to treat the collection as if it were transforming:

adultAddresses := (people select age >= 18) collect address

Now it's a little vague and if you need the parameter to be an argument in your call suddenly you're pulled out of your H.O.M. approach and you're back to the long hand.

Another approach to iteration is to embed another language inside your language. This is the C# approach. Why not stick a SQL-like select paradigm right in to your language and call it a day? This is kind of like Python comprehensions in a way, though I think comprehensions are a better solution than what C# did.

What we need is a way to compose blocks together like they do in functional languages. Monads, really, if we're going to stick a name on it. They weren't a big thing in computing in the 70s when Smalltalk-80 was made.

The idea is to be able to 'paste' the blocks together and then apply them to the collection. Let's give it a go:

adult-addresses := people
  · [person | person age ≥ 18] select
  · [person | person address] collect

Arbitrary syntax chosen for combining these things. Is this better yet? I'm not convinced it is. There's fewer brackets which does make it easier to string them together and we expect things to flow from one block to the next rather than making intermediate collections. Those are plusses to begin with.

The next trick we pull out is to replace · with something that describes the action we're performing. Selecting, Rejecting, Collecting, Reducing. Let's start with the easiest one - selecting.

We're "narrowing" a collection. A few candidates come to mind. Something like > or → or perhaps just -. It'd be ideal to have a keyword that can work as a prefix or as a binary selector. We want a subcollection so lets use folder terminology and pick /.

For the transform perhaps we could use > or → since I suggested them above. For rejecting we don't really need anything, negating stuff with a prefix is dead simple and we could, really, apply that to a block if we wanted it to be 'not'.

adult-addresses := people / [person | person age ≥ 18]
    -> [person | person address]

As for the reduce, we can do a 'longer' transformation with a two argument block as the argument:

combined-age := people --> [accum, person | accum + person age]

Now that we have the basics down we can also introduce the use of methods, not just blocks. Right now everything takes a single argument and makes a deliberation.

But we already have the ability to create blocks and name them, so let's give that a go:

[person adult? | person -> bool | person age ≥ 18]
adult-addresses := people / adult? -> address

This requires a little magic from the compiler. It's passing adult? as an argument which requires the types to match. The / method in this case must take a parameter of a block with a parameter of a the elements of the starting collection.

Another approach might be to reserve the return type of bool to indicate a select and anything else to indicate a map. But if we want to create a map of bools we'd be out of luck.

I've been a bit cheaty with all of this. I've been neglecting the syntax where we specify the return parameter which would be essential to do any kind of yielding operations.

Let's rewind and try again:

adult-addresses := people
  / [in: person -> out: bool | out <- person age ≥ 18]
 -> [in: person -> out: address | out <- person address]

It's messy again - but we learn something interesting. The type of the next operation takes a person as its input which means the previous function in the chain must have been a select since it returns a bool.

This is a minor breakthrough because it simplifies things considerably:

adult-addresses := people /
    [in -> out | out <- in age ≥ 18] /
    [in -> out | out <- in address]

This in/out idiom is powerful but gosh is it noisy. It'd be nice if we could hand wave it all away. Not to mention we've not used any coroutine yielding - the thing running the action is the one who would be feeding in values and yielding for more values, aka: driving the stream.

Have we done better? No. Absolutely not. Let's try again but this time we borrow from Python comprehensions.

adult-addresses := people where: [in -> out | out <- in age ≥ 18]
                            map: [in -> out | out <- in address]

Instead of individual methods that compose together we create variants of the main methods. At this point I like what I see - but what I don't like is how noisy the returns have become since we added coroutines and yields.

What happens if we make naming the output variable optional? with types inferred let's try that again:

adult-addresses := people where: [person | person age ≥ 18]
                            map: [person | person address]

What if we added syntax like for to the language? will we hate ourselves? probably. But what does it look like?

adult-addresses := people/person where: [person age ≥ 18]
                                   map: [person address]

Eh. What if we wanted to iterate this instead of creating a virtual collection over it? The argument to the do: block would be an address not a person. There's no win with moving the each outside the block.

people where: [person | person age ≥ 18]
         map: [person | person address]
        into: [address | stdout println: address]

We could drop the map portion from this example. It's not a big issue to put the address of the person in to a local variable inside the do part of the block.

people where: [person | person age ≥ 18]
        into: [person | stdout println: person address]

Note that people can be a coroutine, stream, or collection. The into: parameter can be a coroutine, a stream, or a collection. We're not fussy. This makes for a powerful API.

Let's try adding the reduce while we're at it:

people where: [person | person age ≥ 18]
        into: [accum = 0, person | accum + person age]

Is it cheating to use the default value? yes, that doesn't work. Let's try that again.

people where: [person | person age ≥ 18]
        into: [accum, person | accum + person age]
  startingAt: 0

We've gotten to a nice looking and powerful API that we can expand upon. But we've had to push back on coroutines to do it. If we're going to keep code easier to read we'll need to come up with another way to do coroutines and nested return escapes.

The simplest solution here would be to say that if you don't specify a named return parameter, you cannot do coroutines. You also cannot escape-return to that level of block. The last value in the evaluation of the block becomes the return value.

That handles all the 'simple' scenarios of block usage but also allows for the more advanced, more complicated 'labelled' block returns as well as coroutines.

And also the big syntax update is out of date. Magical. This is what painting a bike shed looks like. I'm enjoying it though and I hope you are too.