Tasks

Among my areas of lack-of-expertise when it comes to language implementation are threads and scheduling. It's not that implementing a scheduler is that hard, really, but I always have trouble figuring out what the right primitives are, what should happen, exactly, when you create a new thread and such. So I've just ignored threading in saturn for a while and kept the model simple and single-threaded.

Of course you can only get so far without considering threading and this weekend, while trying to solve other problems, I think I accidentally ended up figuring out what the threading model should be based on. The solution is a new concept I currently call tasks, for lack of a better name.

Before getting to the threading part, the problem I was trying to solve actually has to do with collections and iteration. In smalltalk, neptune and saturn, you iterate through collections using blocks. You pass a block object to the collection and ask it to invoke the block for each of its elements:

class Vector {

  ...

  for(block) {
    for (i: 0 .. this.size())
      block(this[i]);
  }

}

This is in most cases more convenient for both the user and implementer of the collection than using iterators like many other languages. With blocks, all intermediate state can be stored in local variables and on the call stack, whereas with iterators, all intermediate state has to be stored in the iterator object:

iterator() {
  return new ListIterator(this);
}

class ListIterator is Iterator {

  def collection;
  var index = 0;

  // constructor

  get_next() {
    def result = collection[index];
    index = index + 1;
    return result;
  }

  has_more() {
    return index < collection.size();
  }

}

Unfortunately, even though blocks work better than iterators most of the time they have some severe limitations. The classic example of this is the fact that you can't iterate through two collections simultaneously using blocks. In the OSVM implementations of smalltalk and neptune, collections (at least some of them) provided both block iteration and iterator objects for exactly this reason. That is of course unfortunate: two different implementations of what is essentially the same functionality.

The core of the problem is the following. With iterator objects, iteration is suspended after each call to get_next. You can wait however long you want before calling get_next again, or not do it at all. This for instance means that you can interleave the iteration of two different collections. With blocks, you can't suspend iteration and then resume it later where you left off. You can abort block iteration in various ways, but then you can't resume it again.

I had looked at various approaches to solving this problem that didn't require iterators and only found one that seemed somewhat promising: what they call generators in Python and iterators in C# (and which also exists in other languages). Essentially this is a co-routine-like mechanism that allows you to suspend the execution of a method and then later resume where you left off. For instance, consider method in python:

def ints():
    k = 0
    while 1:
        yield k
        k = k + 1

When called, ints executes until it reaches a yield statement at which point it returns a value to the caller. When called again it continues immediately after the yield statement. In this case, successive calls to ints will return the integers starting from zero. The mechanism in C# is relatively similar.

Then I went to Copenhagen and back this weekend (to see Beck play live, one of the oddest concerts I've ever seen) and the train ride gave me some time to think about this. The result was the concept of tasks, which is similar to Python and C#'s mechanisms but cleaner.

In the simple case a task is similar to a block, and the syntax is also very similar:

def f = fun -> 4; // block returning 4
def t = task -> 4; // task returning 4

Tasks can be invoked the same way blocks can:

f()
> 4
t()
> 4

One difference between tasks and blocks is that tasks can only be invoked once. Or, to be precise, if a task has once finished executing, it is dead and calling it again will raise an error:

t()
> 4
t()
--> error!

Unlike blocks, tasks can yield values as in python and C#. Yielding a value causes the task to be suspended and the value to be returned to the caller. When the task invoked again it will resume execution immediately after where it yielded before:

def ints = task {
  var k = 0;
  while (true) {
    yield k;
    k = k + 1;
  }
}

ints()
> 0
ints()
> 1
ints()
> 2
...

Finally, you can ask a task whether it has finished executing or if it has more values:

ints.is_done()
> false

And that's it, that's a task.

How does this solve the iteration problem? By allowing block-based iteration to be suspended:

def coll = new Vector("Ni!", "Peng!", "Neeee-wom!");
def iter = task {
  for (o: coll)
    yield o;
};

iter()
> "Ni!"
iter()
> "Peng!"
iter()
> "Neeee-wom!"

This way, Vector doesn't have to implement iterators; instead, tasks can be used to allow block iteration to be suspended and resumed at will.

I've considered letting tasks take arguments. I can't think of any uses for that but it does seem appropriate somehow, what with tasks and blocks looking so similar:

def get_sym = task (prefix) {
  var k = 0;
  while (true) {
    yield "${prefix}-${k}";
    k = k + 1;
  }
};

gen_sym("mip");
> "mip-0"
gen_sum("fab")
> "fab-1"

It's not too difficult to implement, I just feel silly doing so if it's completely useless. But maybe I or someone else can come up with a neat use, I hope so.

Originally, on the train to Copenhagen, I didn't think of tasks as tasks but as generators: objects that generate values. And the keyword wasn't task, it was gen. But on the way back it occurred to me that maybe this is a more general concept than a generator -- it's more like a piece of work that can be paused and resumed during execution, which is where the name task came from. Also, since a task has its own execution state and stack it is really very similar to a thread. With some more work I think it's possible to design a pretty decent threading model based on this abstraction. That might be the next thing I'll work on.

I suspect that tasks will play the same kind of role that selector objects did in OSVM smalltalk and neptune: they're invaluable in one or two key classes that implement some general abstractions, and then the abstractions can be used instead of the "raw" tasks or selectors.