Monthly Archives: August 2006

Offline

I’ve sent my laptop off to be repaired, the harddisk died (quietly — that’s not my mac in the picture). This means that I will be more or less offline for the next two to three weeks. I will probably check my mail occasionally but I can’t guarantee that I’ll do it every day.

No computer. For three weeks. The horror!

Also, I’ve just moved quenta.org from b-one to dreamhost which I’m sure will cause all sorts of problem. Of which I, offline as I am, will be blissfully ignorant. Yeah, things are going pretty well here.

Tasks

Among my areas of lack-of-expertise when it comes to language implementation are threads and scheduling. It’s not that implementing a scheduler is that hard, really, but I always have trouble figuring out what the right primitives are, what should happen, exactly, when you create a new thread and such. So I’ve just ignored threading in saturn for a while and kept the model simple and single-threaded.

Of course you can only get so far without considering threading and this weekend, while trying to solve other problems, I think I accidentally ended up figuring out what the threading model should be based on. The solution is a new concept I currently call tasks, for lack of a better name.

Before getting to the threading part, the problem I was trying to solve actually has to do with collections and iteration. In smalltalk, neptune and saturn, you iterate through collections using blocks. You pass a block object to the collection and ask it to invoke the block for each of its elements:

class Vector {

...

for(block) {
for (i: 0 .. this.size())
block(this[i]);
}

}

This is in most cases more convenient for both the user and implementer of the collection than using iterators like many other languages. With blocks, all intermediate state can be stored in local variables and on the call stack, whereas with iterators, all intermediate state has to be stored in the iterator object:

iterator() {
return new ListIterator(this);
}

class ListIterator is Iterator {

def collection;
var index = 0;

// constructor

get_next() {
def result = collection[index];
index = index + 1;
return result;
}

has_more() {
return index < collection.size();
}

}

Unfortunately, even though blocks work better than iterators most of the time they have some severe limitations. The classic example of this is the fact that you can’t iterate through two collections simultaneously using blocks. In the OSVM implementations of smalltalk and neptune, collections (at least some of them) provided both block iteration and iterator objects for exactly this reason. That is of course unfortunate: two different implementations of what is essentially the same functionality.

The core of the problem is the following. With iterator objects, iteration is suspended after each call to get_next. You can wait however long you want before calling get_next again, or not do it at all. This for instance means that you can interleave the iteration of two different collections. With blocks, you can’t suspend iteration and then resume it later where you left off. You can abort block iteration in various ways, but then you can’t resume it again.

I had looked at various approaches to solving this problem that didn’t require iterators and only found one that seemed somewhat promising: what they call generators in Python and iterators in C# (and which also exists in other languages). Essentially this is a co-routine-like mechanism that allows you to suspend the execution of a method and then later resume where you left off. For instance, consider method in python:

def ints():
k = 0
while 1:
yield k
k = k + 1

When called, ints executes until it reaches a yield statement at which point it returns a value to the caller. When called again it continues immediately after the yield statement. In this case, successive calls to ints will return the integers starting from zero. The mechanism in C# is relatively similar.

Then I went to Copenhagen and back this weekend (to see Beck play live, one of the oddest concerts I’ve ever seen) and the train ride gave me some time to think about this. The result was the concept of tasks, which is similar to Python and C#’s mechanisms but cleaner.

In the simple case a task is similar to a block, and the syntax is also very similar:

def f = fun -> 4; // block returning 4
def t = task -> 4; // task returning 4

Tasks can be invoked the same way blocks can:

f()
> 4
t()
> 4

One difference between tasks and blocks is that tasks can only be invoked once. Or, to be precise, if a task has once finished executing, it is dead and calling it again will raise an error:

t()
> 4
t()
--> error!

Unlike blocks, tasks can yield values as in python and C#. Yielding a value causes the task to be suspended and the value to be returned to the caller. When the task invoked again it will resume execution immediately after where it yielded before:

def ints = task {
var k = 0;
while (true) {
yield k;
k = k + 1;
}
}

ints()
> 0
ints()
> 1
ints()
> 2
...

Finally, you can ask a task whether it has finished executing or if it has more values:

ints.is_done()
> false

And that’s it, that’s a task.

How does this solve the iteration problem? By allowing block-based iteration to be suspended:

def coll = new Vector("Ni!", "Peng!", "Neeee-wom!");
def iter = task {
for (o: coll)
yield o;
};

iter()
> "Ni!"
iter()
> "Peng!"
iter()
> "Neeee-wom!"

This way, Vector doesn’t have to implement iterators; instead, tasks can be used to allow block iteration to be suspended and resumed at will.

I’ve considered letting tasks take arguments. I can’t think of any uses for that but it does seem appropriate somehow, what with tasks and blocks looking so similar:

def get_sym = task (prefix) {
var k = 0;
while (true) {
yield "${prefix}-${k}";
k = k + 1;
}
};

gen_sym("mip");
> "mip-0"
gen_sum("fab")
> "fab-1"

It’s not too difficult to implement, I just feel silly doing so if it’s completely useless. But maybe I or someone else can come up with a neat use, I hope so.

Originally, on the train to Copenhagen, I didn’t think of tasks as tasks but as generators: objects that generate values. And the keyword wasn’t task, it was gen. But on the way back it occurred to me that maybe this is a more general concept than a generator — it’s more like a piece of work that can be paused and resumed during execution, which is where the name task came from. Also, since a task has its own execution state and stack it is really very similar to a thread. With some more work I think it’s possible to design a pretty decent threading model based on this abstraction. That might be the next thing I’ll work on.

I suspect that tasks will play the same kind of role that selector objects did in OSVM smalltalk and neptune: they’re invaluable in one or two key classes that implement some general abstractions, and then the abstractions can be used instead of the “raw” tasks or selectors.

For

Since my last two posts about for expressions I’ve been working on an implementation. Implementing new features in Saturn is pretty easy, by design, but this feature has taken longer than usual for two reasons. First of all, I really only had the syntax when I started so I first had to come up with a decent semantics. Also, implementing it revealed a bunch of bugs in other parts of the system — for instance, it turns out that mixing tail-call optimization and non-local returns gives some interesting problems. But all that has been fixed. Here’s a method that returns the factors of a given integer

factors(k) -> for (x : 2 .. k, k % x == 0) -> x;
// factors(18) = [2, 3, 6, 9]

It’s not particularly efficient but it is definitely succinct. You can also create infinite data structures:

def primes = for (x: Naturals, x > 1, factors(x).is_empty()) -> x;
// primes = [2, 3, 5, 7, ...]
// primes[100] = 541

Of course you can also use it for plain, everyday stuff like

def all_names = for (person: employees) -> person.get_name()

Compare this with, let’s pick a random language, Java:

List all_names = new ArrayList();
for (Person person: employees)
names.add(person.getName());

It’s just not the same…

Now, how does this actually work? The for expression uses a single method on collections: map_filter (I’m not too happy about this name so if you have a better idea feel free to tell me). This method takes two block arguments, map and filter and returns a new collection which contains the results of map applied to the elements in this collection for which filter returns true. A straightforward implementation could be:

map_filter(map, filter) {
def result = new ArrayList();
for (elm: this) {
if (filter(elm))
result.add(map(elm));
}
return result;
}

Note that for statements (for (...) { ... }) are not the same as for expressions (for (...) -> ...).

All a for expression is, really, is calls to the map_filter method. For instance, the body of the factor function above is expanded into

(2 .. k).map_filter(fun x -> x, fun x -> k % x == 0)

Implementing the map_filter method should be pretty straightforward and the good thing is that the implementer is free to decide how to represent the result. I definitely like this approach better than requiring collections to implement several different methods. Scala, for instance, uses three: map, filter and flatMap.

The freedom to choose the representation of the result is the key to implementing the list of primes: the implementation of map_filter in Naturals applies the map and filter lazily:

class Naturals is Collection {

map_filter(map, filter)
-> new MapFilterCollection(this, map, filter);

operator [index] -> index;

for(block) {
var i = 0;
while (true) {
block(i);
i = i + 1;
}
}

}

class MapFilterCollection is Collection {

def collection, map, filter;

// constructor, etc.

map_filter(map, filter)
-> new MapFilterCollection(this, map, filter);

for(block) {
for (obj: collection) {
if (filter(obj))
block(map(obj));
}
}

operator [index] {
var i = 0;
for (obj : this) {
if (index == i) return obj;
i = i + 1;
}
}

}

This is not an efficient implementation of MapFilterCollection but it should give an idea of how things work underneath: the collection just postpones all the work until you iterate through it or ask for a particular element.

Apart from the name of the method, map_filter, I think that’s a pretty decent approach. There’s still one issue left: what happens when you iterate through several collections at once:

for (x: 0 .. 100, y: 0 .. x, z: 0 .. y,
x + y + z == 50) -> new Tuple(x, y, z)

This expression uses three map_filters, one for each variable, but the result has to be turned into a single collection. I have implemented a solution but I think is still needs a bit of work. I’ll write more about that later.

Sequence Comprehensions

Incredible: it turns out that the for expression syntax for list comprehensions I described in my last post is almost identical to the syntax of sequence comprehensions in Scala. Compare this saturn code:

for (x: ints, x % 2 == 0) -> x

with this code in Scala:

for (val x <- ints; x % 2 == 0) yield x

Looks pretty familiar right? I do like the saturn syntax better, though.

Why do I keep having other people’s ideas?

Details

I had originally decided that I wasn’t going to spent a lot of time coming up with new features for saturn but would just reimplement neptune. However, as I might have expected, I just can’t help myself. Many of the “new” ideas are things we discussed back when we designed neptune, though, so it’s more “outtakes” than actual new material.

The thing I’ve been most happy about adding so far is also the very simplest: a compact syntax for creating intervals. In C, Java and even neptune, you iterate over an interval by using the standard for loop:

for (int i = 0; i < n; i++) ...

This is unfortunate for more than one reason. First of all, it's too much boilerplate. 90% of for loops that iterate over a sequence of numbers are the same except for the name of the loop variable and the limit to iterate up to. Below, I've underlined the parts of the statement that aren't boilerplate:

for (int i = 0; i < n; i++) ...

That's actually less than half the code! Also, the loop variable is mentioned three times which gives ample opportunity to get it wrong. If you're like me, for instance, and have the variable i hardwired in your brain but need to iterate for a different variable there's a very real risk that you'll end up writing

for (int j = 0; i < n; i++) ...

The third problem with this kind of for loops is a classic: the limit is evaluated for each iteration. In rare cases that is exactly what you need. More often, however, it is either unnecessary (and may waste execution time) or just plain wrong.

In saturn it takes just one method to fix this: the .. method on the class Integer:

class Integer {
...
operator ..(that) {
return new Interval(this, that);
}
}

With this method, writing from .. to creates a new open interval from from to to. What can you do with an interval? Most importantly, iterate through it:

for (i : 0 .. n) {
// whatever
}

Almost all the boilerplate code is gone. The iteration variable is only mentioned once and the limit is only evaluated once, when the interval is created. And since the receiver of the .. method is known at compile time, the compiler can optimize away the creation of the interval with no visible consequences other than a performance improvement. Yessir!

Another change I've made is to add more shorthands for occurrences of expressions. In neptune we had two syntaxes for blocks: one for statement blocks and one for blocks whose body was just a single expression:

// statement block
fun (x) {
var str = x.to_string();
System.print_on_stdout(str);
}

// expression block
fun (x) -> x * 2

In saturn, I've decided to introduce this distinction more places. Possibly the most visible place I've added this is methods -- there is now a compact syntax for methods that return the value of a single expression:

class Point {
var x, y;
...
norm() -> sqrt(x * x + y * y);
}

In neptune (and Java and ...) you would have had to write something like

norm() {
return sqrt(x * x + y * y);
}

Regular "statement methods" still exist but this makes it easier to write short methods and having short methods is generally a Good Thing. Also, to me, it just sort of looks right in a way I can't explain.

I'm also considering adding a new form of for statement along the same lines:

var ints = for (i : 0 .. 100) -> 2 * i;
// ints = [0, 2, 4, ..., 198]

As with a regular for statement, a for expression evaluates its body repeatedly but the expression collects the values of the body in a new collection. It's essentially a list comprehension

ints = [ i * 2 | i <- [0 .. 100] ]

but with an untraditional syntax that has a number of advantages: it's very easy to explain in terms of for statements and the variable definitions occur syntactically before their use. In the Haskell-like syntax above, i is being used before it's defined which I find non-obvious and difficult to read.

There you have it: three new features, two of which have already been implemented. I haven't figured out all the details of how the new for expression should work but I'm sure I can take inspiration from existing list comprehension mechanisms. I'm also working towards an alpha release of the compiler and runtime but currently you'll have to get it through CVS from saturn-language at sourceforge to try it out.