Well Done: A Sentinel Value
↩ ↪April 17, 2013
This is kinda like part three of the Iteration Inside and Out posts, so you may want to check out part one and part two too. But you don’t have to.
Most programming languages have one or two singleton values floating around.
These are special built-in objects that only have a single instance. null
or
nil
are common. Some languages put true
and false
in the same bucket.
JavaScript, ever the hipster, ironically defines one called
undefined
.
I wanted to talk a little bit about one I just added to my little language
Magpie: a sentinel value called done
. I’m still not certain it’s a great
idea, but I’ll walk you through my thoughts. If nothing else, maybe it will
serve as a cautionary tale for other, smarter language designers.
Channels
I ran into the problem that ultimately led to done
when I started working on
Magpie’s concurrency story. Its model is based on fibers and channels,
much like Go or other CSP-inspired languages. You can spin up a new
fiber using the async
keyword:
async
print("I'm in another fiber!")
end
Fibers are cooperatively scheduled so they only yield to other fibers when they do something that requires waiting. Usually that means IO. So if you do the above, you won’t see that message get printed until something causes the main fiber to yield. For example:
async
print("I'm in another fiber!")
end
print("Before")
print("After")
This program will create a second fiber (but not switch to it). Then it gets to
print("Before")
in the main fiber. Since printing is IO, that switches to the
second fiber. That in turn queues up its print
and suspends again, so the
main fiber resumes. Ultimately, it prints:
Before
I'm in another fiber!
After
You can spin up lots and lots of fibers because they don’t use OS threads, just some memory in the interpreter. This is swell for decoupling stuff and running things concurrently. But sometimes you do need to coordinate fibers with each other. For that, we’ve got channels.
A channel is a simple object that you can send objects into and then receive them from another channel. You can create one like:
var channel = Channel new
To send a value along it, just do:
channel send("My value")
And you can receive it like:
var result = channel receive
The fun bit is that when a fiber sends a value along a channel, it causes that fiber to suspend until some other fiber shows up to receive the value. A send doesn’t complete until the object has been received. So channels let you not just communicate but also synchronize.
Show’s over, folks
There’s one other thing you can do with a channel, you can close it:
channel close
That puts the channel out of commission and tells everyone that they will receive no more values from it. The question I ran into was, “What happens to fibers that are waiting to receive on a channel when it gets closed?” For example:
var channel = Channel new
// Wait for a value and print it.
async print(channel receive)
// From the main fiber, close the channel.
channel close
Here, the second fiber is sitting there, relaxing, maybe having a cocktail while it waits for the channel to spew something forth. But it doesn’t, it gets closed. The fiber shouldn’t just stay suspended forever, so how should it get notified?
One option would be have receive
throw an error when the channel gets
closed. So you’d handle it something like:
async
do
print(channel receive)
catch err is ChannelClosedError then
print("Sorry, no value for you")
end
end
Actually, any block can have a catch
clause in Magpie, so the explicit
do
here is redundant. You’d really just do:
async
print(channel receive)
catch err is ChannelClosedError then
print("Sorry, no value for you")
end
That’s not too bad, but consider something a little more fleshed out. In many cases, you’ll receive something from a channel and do different stuff based on what you get. Channels are often used for messages and the specific message will often cause different behavior. You can imagine something like:
var channel = Channel new
async
var count = 1
while true do
match channel receive
case "inc" then print(count = count + 1)
case "dec" then print(count = count - 1)
end
end
end
Here we’ve got a little concurrent counter that you can send messages to in order to increment and decrement the value. If it also needs to handle the channel closing, you’d need to do:
async
var count = 1
while true do
match channel receive
case "inc" then print(count = count + 1)
case "dec" then print(count = count - 1)
end
catch err is ChannelClosedError then
print("Closed")
end
end
That feels redundant to me. Catching errors is basically pattern matching
(in fact, in Magpie it uses the exact same syntax and semantics as a
match
expression), so it feels dumb to have to have two matches,
one for the set of normal values and then this extra catch
for the one special
“no more values” signal.
It’s not just redundant either. Even if you don’t care about the channel
closing, you have to add the catch
here. Otherwise, since it’s a thrown
error, it will unwind the fiber’s callstack on you.
Both of those are no fun, so I figured, why not just make receive
return a
value when the channel is closed instead of throwing one. Magpie already has
a null
-like sentinel value called nothing
, so I could just say
receive
returns nothing
when the channel is closed.
But that’s a bit lame. While nothing
doesn’t appear as often as null
does
in other languages, it’s still a general-purpose “no value here” object that
users need to be free to use anywhere. If receive
returns nothing
when the
channel is closed, there’d be no way to explicitly send nothing
along an open
channel. Receivers wouldn’t be able to tell the difference.
Are we done?
So I made another singleton value: done
. It exists solely to represent “no
more values”. With this, the above code is like:
async
var count = 1
while true do
match channel receive
case "inc" then print(count = count + 1)
case "dec" then print(count = count - 1)
case done then print("Closed")
end
end
end
It may seem weird to have a method return values of different types, but this is natural in Magpie. The language is dynamically typed, and revolves heavily around pattern matching. In, say, JavaScript, if you have a function that can return a number or a string and callers will want to handle those cases differently you get this ugly code:
var result = foo();
if (typeof result === 'number') {
console.log("number");
} else if (typeof result === 'string') {
console.log("string");
}
In Magpie, that’s:
match foo()
case is Num then print("number")
case is String then print("string")
end
Or, since the language is expression-oriented, you could even do:
print(match foo()
case is Num then "number"
case is String then "string"
end)
In fact, because Magpie also has multimethods, even this would do the right thing:
def showType(is Num) print("number")
def showType(is String) print("string")
showType(foo())
Where was I? Oh, right. So it’s totally kosher to have methods return values of
different types and have callers dispatch on them. That was enough to talk
myself into being OK with receive
having a sort of ad-hoc variant-like return
value.
The next iteration
Once I had that, I started fleshing out channels a bit more. One thing I noticed
is that you often read from a channel repeatedly. See that while true do
loop
up there? That seemed a bit silly. Magpie has for
loops. Like most
languages, those are syntactic sugar for an iteration
protocol. If I made channels support that directly, you could do:
async
var count = 1
for value in channel receive
match value
case "inc" then print(count = count + 1)
case "dec" then print(count = count - 1)
end
end
print("Closed")
end
This protocol has two levels: iterable and iterator. The iterable protocol is for an object that can be iterated: lists, collections, etc. There’s just one method:
def (is Iterable) iterate
An iterable type specializes iterate
to return a new iterator for the object.
This lets you iterate over objects without modifying the original object. It
would be pretty weird if you could only use a list in one loop at a time.
In the case of channels, though, iterating it is mutating the original object,
so iterate
on a channel just returns the same object. Channels are both
iterables and iterators. The iterator protocol was two methods:
def (is Iterator) advance
This tries to advance the iterator to the next item and returns true
if it
found another item.
def (is Iterator) current
This returns the current item. So there was a two method protocol here. Most other languages have something similar. You need two methods because one crank of the iterator needs to return two bits of data: whether or not you’re done iterating, and the next value if you aren’t.
I say “was a two method protocol” because now that I had this done
value, I
realized I could simplify it. All I need is advance
. It returns the next item
if there is one, or done
if there isn’t. The only thing you lose is the
ability to iterate over a collection that actually contains done
as a value.
But I’m OK with that. That seems like a small thing to give up to get a simpler
protocol.
Now, not only can you use channels any place you work with sequences, but even implementing your own sequences is a little easier.
Your favorite functional friends
When I say “any place you work with sequences,” much of what I’m thinking of is
those workhorse methods that transform sequences. Good old map()
,
filter()
, fold()
, etc. This gives you a feel for what they
look like in Magpie:
var list = [1, 2, 3, 4]
list map(fn(i) i * 3) // 3, 6, 9, 12.
list where(fn(i) i % 2 == 0) // 2, 4.
Since I specialized these on Iterable, that means any iterable type, including channels, gets them. For a longer example, Go has an example showing a concurrent Sieve of Eratosthenes. Ported to Magpie, it looks like:
// Send the sequence 2, 3, 4, ... to 'channel'.
def generate(channel is Channel) async
for i in 2..9999 do channel send(i)
end
// Copy the values from 'input' to 'out',
// removing those divisible by 'prime'.
def filter(input is Channel, out is Channel, prime is Int) async
input where(fn(i) i % prime != 0) pipeTo(out)
end
// The prime sieve: Daisy-chain Filter processes.
var channel = Channel new
generate(channel) // Spawn generate fiber.
for i in 0..50 do
val prime = print(channel receive)
val sieve = Channel new
filter(channel, sieve, prime)
channel = sieve
end
I won’t walk you through the whole thing, but I do think it’s pretty rad. The neat bit is this line:
input where(fn(i) i % prime != 0) pipeTo(output)
Here, where()
is Magpie’s filter function. It takes an iterable on the left
and a predicate function on the right. It returns an iterable generating all of
the items of the original sequence where the predicate returns true
. Note that
here, the argument on the left is a channel so this magically becomes an
asynchronous filter.
Likewise, pipeTo()
is a method that takes an iterable and a channel and sends
all of the values in the iterable to the channel. So we’re taking a channel,
filtering it like a collection, then pumping that collection back into a new
channel. This all happens asynchronously, but the code looks like you’re just
playing with regular collections.
Crap, I’ve digressed again. I was supposed to be talking about done
. I just
think treating channels as sequences is really cool. OK, back to the matter at
hand.
Now are we done yet?
The simplest of the higher-order methods for working with iterables is each()
.
All it does is call the given function for each item in the sequence. In other
languages this is sometimes called forEach()
. Most of the time you don’t need
it since it is almost identical to just using a built-in for
loop. But
sometimes it’s convenient, so Magpie has it.
If you’ve worked with forEach()
in JavaScript, one nasty problem is
how do you stop early if you don’t actually need to walk the whole sequence?
Ruby has some special syntax built into the language, break
and next
, that
are designed to work specifically with blocks and iterators. JS ain’t so lucky.
Magpie has break
for for
loops, but a method like each
is just a method
that takes a function. You can’t break
from within a function. But… since we
have this done
value laying around, we can use it here too. So the way
each()
(and map()
and where()
and others) work is if the callback ever
returns done
that means “stop iterating now”.
For example:
[1, 2, 3, 4] map(fn(i) if i < 3 then i * 2 else done)
This will not only transform the values—what map()
usually does—but it
will also truncate the sequence and just return [2, 4]
.
So we’ve got an easier way to handle closed channels, a simpler iterator protocol, and more expressive collection-manipulation methods. Not too bad for one little sentinel value.