Well Done: A Sentinel Value

April 17, 2013code, golang, language, magpie

This is kinda like part three of the Iteration Inside and Out posts, so you may want to check out part one and part two too. But you don’t have to.

Most programming languages have one or two singleton values floating around. These are special built-in objects that only have a single instance. null or nil are common. Some languages put true and false in the same bucket. JavaScript, ever the hipster, ironically defines one called undefined.

I wanted to talk a little bit about one I just added to my little language Magpie: a sentinel value called done. I’m still not certain it’s a great idea, but I’ll walk you through my thoughts. If nothing else, maybe it will serve as a cautionary tale for other, smarter language designers.

Channels

I ran into the problem that ultimately led to done when I started working on Magpie’s concurrency story. Its model is based on fibers and channels, much like Go or other CSP-inspired languages. You can spin up a new fiber using the async keyword:

async
    print("I'm in another fiber!")
end

Fibers are cooperatively scheduled so they only yield to other fibers when they do something that requires waiting. Usually that means IO. So if you do the above, you won’t see that message get printed until something causes the main fiber to yield. For example:

async
    print("I'm in another fiber!")
end

print("Before")
print("After")

This program will create a second fiber (but not switch to it). Then it gets to print("Before") in the main fiber. Since printing is IO, that switches to the second fiber. That in turn queues up its print and suspends again, so the main fiber resumes. Ultimately, it prints:

Before
I'm in another fiber!
After

You can spin up lots and lots of fibers because they don’t use OS threads, just some memory in the interpreter. This is swell for decoupling stuff and running things concurrently. But sometimes you do need to coordinate fibers with each other. For that, we’ve got channels.

A channel is a simple object that you can send objects into and then receive them from another channel. You can create one like:

var channel = Channel new

To send a value along it, just do:

channel send("My value")

And you can receive it like:

var result = channel receive

The fun bit is that when a fiber sends a value along a channel, it causes that fiber to suspend until some other fiber shows up to receive the value. A send doesn’t complete until the object has been received. So channels let you not just communicate but also synchronize.

Show’s Over Folks

There’s one other thing you can do with a channel, you can close it:

channel close

That puts the channel out of commission and tells everyone that they will receive no more values from it. The question I ran into was, “what happens to fibers that are waiting to receive on a channel when it gets closed?” For example:

var channel = Channel new

// Wait for a value and print it.
async print(channel receive)

// From the main fiber, close the channel.
channel close

Here, the second fiber is sitting there, relaxing, maybe having a cocktail while it waits for the channel to spew something forth. But it doesn’t, it gets closed. The fiber shouldn’t just stay suspended forever, so how should it get notified?

One option would be have receive throw an error when the channel gets closed. So you’d handle it something like:

async
    do
        print(channel receive)
    catch err is ChannelClosedError then
        print("Sorry, no value for you")
    end
end

Actually, any block can have a catch clause in Magpie, so the explicit do here is redundant. You’d really just do:

async
    print(channel receive)
catch err is ChannelClosedError then
    print("Sorry, no value for you")
end

That’s not too bad, but consider something a little more fleshed out. In many cases, you’ll receive something from a channel and do different stuff based on what you get. Channels are often used for messages and the specific message will often cause different behavior. You can imagine something like:

var channel = Channel new

async
    var count = 1
    while true do
        match channel receive
            case "inc" then print(count = count + 1)
            case "dec" then print(count = count - 1)
        end
    end
end

Here we’ve got a little concurrent counter that you can send messages to in order to increment and decrement the value. If it also needs to handle the channel closing, you’d need to do:

async
    var count = 1
    while true do
        match channel receive
            case "inc" then print(count = count + 1)
            case "dec" then print(count = count - 1)
        end
    catch err is ChannelClosedError then
        print("Closed")
    end
end

That feels redundant to me. Catching errors is basically pattern-matching (in fact, in Magpie it uses the exact same syntax and semantics as a match expression), so it feels dumb to have to have two matches, one for the set of normal values and then this extra catch for the one special “no more values” signal.

It’s not just redundant either. Even if you don’t care about the channel closing, you have to add the catch here. Otherwise, since it’s a thrown error, it will unwind the fiber’s callstack on you.

Both of those are no fun, so I figured, why not just make receive return a value when the channel is closed instead of throwing one. Magpie already has a null-like sentinel value called nothing, so I could just say receive returns nothing when the channel is closed.

But that’s a bit lame. While nothing doesn’t appear as often as null does in other languages, it’s still a general-purpose “no value here” object that users need to be free to use anywhere. If receive returns nothing when the channel is closed, there’d be no way to explicitly send nothing along an open channel. Receivers wouldn’t be able to tell the difference.

Are We Done?

So I made another singleton value: done. It exists solely to represent “no more values”. With this, the above code is like:

async
    var count = 1
    while true do
        match channel receive
            case "inc" then print(count = count + 1)
            case "dec" then print(count = count - 1)
            case done then print("Closed")
        end
    end
end

It may seem weird to have a method return values of different types, but this is natural in Magpie. The language is dynamically typed, and revolves heavily around pattern-matching. In, say, JavaScript, if you have a function that can return a number or a string and callers will want to handle those cases differently you get this ugly code:

var result = foo();
if (typeof result === 'number') {
  console.log("number");
} else if (typeof result === 'string') {
  console.log("string");
}

In Magpie, that’s:

match foo()
    case is Num then print("number")
    case is String then print("string")
end

Or, since the language is expression-oriented, you could even do:

print(match foo()
    case is Num then "number"
    case is String then "string"
end)

In fact, because Magpie also has multimethods, even this would do the right thing:

def showType(is Num) print("number")
def showType(is String) print("string")

showType(foo())

Where was I? Oh, right. So it’s totally kosher to have methods return values of different types and have callers dispatch on them. That was enough to talk myself into being OK with receive having a sort of ad-hoc variant-like return value.

The Next Iteration

Once I had that, I started fleshing out channels a bit more. One thing I noticed is that you often read from a channel repeatedly. See that while true do loop up there? That seemed a bit silly. Magpie has for loops. Like most languages, those are syntactic sugar for an iteration protocol. If I made channels support that directly, you could do:

async
    var count = 1
    for value in channel receive
        match value
            case "inc" then print(count = count + 1)
            case "dec" then print(count = count - 1)
        end
    end
    print("Closed")
end

This protocol has two levels: iterable and iterator. The iterable protocol is for an object that can be iterated: lists, collections, etc. There’s just one method:

def (is Iterable) iterate

An iterable type specializes iterate to return a new iterator for the object. This lets you iterate over objects without modifying the original object. It would be pretty weird if you could only use a list in one loop at a time.

In the case of channels, though, iterating it is mutating the original object, so iterate on a channel just returns the same object. Channels are both iterables and iterators. The iterator protocol was two methods:

def (is Iterator) advance

This tries to advance the iterator to the next item and returns true if it found another item.

def (is Iterator) current

This returns the current item. So there was a two method protocol here. Most other languages have something similar. You need two methods because one crank of the iterator needs to return two bits of data: whether or not you’re done iterating, and the next value if you aren’t.

I say “was a two method protocol” because now that I had this done value, I realized I could simplify it. All I need is advance. It returns the next item if there is one, or done if there isn’t. The only thing you lose is the ability to iterate over a collection that actually contains done as a value. But I’m OK with that. That seems like a small thing to give up to get a simpler protocol.

Now, not only can you use channels any place you work with sequences, but even implementing your own sequences is a little easier.

Your Favorite Functional Friends

When I say “any place you work with sequences” much of what I’m thinking of is those workhorse methods that transform sequences. Good old map, filter, fold, etc. This gives you a feel for what they look like in Magpie:

var list = [1, 2, 3, 4]
list map(fn(i) i * 3) // 3, 6, 9, 12
list where(fn(i) i % 2 == 0) // 2, 4

Since I specialized these on Iterable, that means any iterable type, including channels, gets them. For a longer example, Go has an example showing a concurrent Sieve of Eratosthenes. Ported to Magpie, it looks like:

// Send the sequence 2, 3, 4, ... to 'channel'.
def generate(channel is Channel) async
    for i in 2..9999 do channel send(i)
end

// Copy the values from 'input' to 'out',
// removing those divisible by 'prime'.
def filter(input is Channel, out is Channel, prime is Int) async
    input where(fn(i) i % prime != 0) pipeTo(out)
end

// The prime sieve: Daisy-chain Filter processes.
var channel = Channel new
generate(channel) // Spawn generate fiber.

for i in 0..50 do
    val prime = print(channel receive)
    val sieve = Channel new
    filter(channel, sieve, prime)
    channel = sieve
end

I won’t walk you through the whole thing, but I do think it’s pretty rad. The neat bit is this line:

input where(fn(i) i % prime != 0) pipeTo(output)

Here, where is Magpie’s filter function. It takes an iterable on the left and a predicate function on the right. It returns an iterable generating all of the items of the original sequence where the predicate returns true. Note that here, the argument on the left is a channel so this magically becomes an asynchronous filter.

Likewise, pipeTo is a method that takes an iterable and a channel and sends all of the values in the iterable to the channel. So we’re taking a channel, filtering it like a collection, then pumping that collection back into a new channel. This all happens asynchronously, but the code looks like you’re just playing with regular collections.

Crap, I’ve digressed again. I was supposed to be talking about done. I just think treating channels as sequences is really cool. OK, back to the matter at hand.

Now Are We Done Yet?

The simplest of the higher-order methods for working with iterables is each. All it does is call the given function for each item in the sequence. In other languages this is sometimes called forEach. Most of the time you don’t need it since it is almost identical to just using a built-in for loop. But sometimes it’s convenient, so Magpie has it.

If you’ve worked with forEach in JavaScript, one nasty problem is how do you stop early if you don’t actually need to walk the whole sequence? Ruby has some special syntax built into the language, break and next, that are designed to work specifically with blocks and iterators. JS ain’t so lucky.

Magpie has break for for loops, but a method like each is just a method that takes a function. You can’t break from within a function. But… since we have this done value laying around, we can use it here too. So the way each (and map and where and others) work is if the callback ever returns done that means “stop iterating now”.

For example:

[1, 2, 3, 4] map(fn(i) if i < 3 then i * 2 else done)

This will not only transform the values (i.e. what map usually does), but it will also truncate the sequence and just return [2, 4].

So we’ve got an easier way to handle closed channels, a simpler iterator protocol, and more expressive collection-manipulation methods. Not too bad for one little sentinel value.