Category Archives: saturn

Saturn UI

Sigh. My mac is still being repaired. Fortunately, it turns out that my access code still works at the university so even though it’s not exactly ideal, it gives me a place to work when I really want to. So I have made some progress over the last few weeks.

Threading now works (see) but the scheduler is exceedingly stupid. One nice thing about this implementation, though, is that threading is completely deterministic. Interrupts appear to be nondeterministic but are actually based on values returned by a random number generator. All you need to know to repeat an execution is the seed of the generator and then the run will be exactly the same. That’s been pretty useful.

I’ve also found time to work on an eclipse plugin. Now there’s a perspective with a saturn editor with syntax coloring and integrated launching:


Notice the little saturn perspective marker in the top right corner. It took me forever to draw that.

The compiler is integrated with the IDE so there’s immediate compilation when files are saved:


Finally, there’s a simple debugger:


There’s still a lot of work left to do before the IDE is decent but I think it’s already surprisingly useful. You get a lot of functionality with relatively little work if you ask eclipse nicely.

By the way, if you look closely at the code in the screenshots you’ll notice that I’ve changed the block syntax from fun (...) -> ... to fn (...) -> .... I don’t really care if it’s fun or fn but I remember caring once, a few years ago when I first saw ML and O’Caml, and I remember preferring fn. I thought my former self should have some influence on this language. Even though he probably wouldn’t have liked it; he thought object orientation was for wimps. Good thing I’m running things now and not him.

Tasks

Among my areas of lack-of-expertise when it comes to language implementation are threads and scheduling. It’s not that implementing a scheduler is that hard, really, but I always have trouble figuring out what the right primitives are, what should happen, exactly, when you create a new thread and such. So I’ve just ignored threading in saturn for a while and kept the model simple and single-threaded.

Of course you can only get so far without considering threading and this weekend, while trying to solve other problems, I think I accidentally ended up figuring out what the threading model should be based on. The solution is a new concept I currently call tasks, for lack of a better name.

Before getting to the threading part, the problem I was trying to solve actually has to do with collections and iteration. In smalltalk, neptune and saturn, you iterate through collections using blocks. You pass a block object to the collection and ask it to invoke the block for each of its elements:

class Vector {

...

for(block) {
for (i: 0 .. this.size())
block(this[i]);
}

}

This is in most cases more convenient for both the user and implementer of the collection than using iterators like many other languages. With blocks, all intermediate state can be stored in local variables and on the call stack, whereas with iterators, all intermediate state has to be stored in the iterator object:

iterator() {
return new ListIterator(this);
}

class ListIterator is Iterator {

def collection;
var index = 0;

// constructor

get_next() {
def result = collection[index];
index = index + 1;
return result;
}

has_more() {
return index < collection.size();
}

}

Unfortunately, even though blocks work better than iterators most of the time they have some severe limitations. The classic example of this is the fact that you can’t iterate through two collections simultaneously using blocks. In the OSVM implementations of smalltalk and neptune, collections (at least some of them) provided both block iteration and iterator objects for exactly this reason. That is of course unfortunate: two different implementations of what is essentially the same functionality.

The core of the problem is the following. With iterator objects, iteration is suspended after each call to get_next. You can wait however long you want before calling get_next again, or not do it at all. This for instance means that you can interleave the iteration of two different collections. With blocks, you can’t suspend iteration and then resume it later where you left off. You can abort block iteration in various ways, but then you can’t resume it again.

I had looked at various approaches to solving this problem that didn’t require iterators and only found one that seemed somewhat promising: what they call generators in Python and iterators in C# (and which also exists in other languages). Essentially this is a co-routine-like mechanism that allows you to suspend the execution of a method and then later resume where you left off. For instance, consider method in python:

def ints():
k = 0
while 1:
yield k
k = k + 1

When called, ints executes until it reaches a yield statement at which point it returns a value to the caller. When called again it continues immediately after the yield statement. In this case, successive calls to ints will return the integers starting from zero. The mechanism in C# is relatively similar.

Then I went to Copenhagen and back this weekend (to see Beck play live, one of the oddest concerts I’ve ever seen) and the train ride gave me some time to think about this. The result was the concept of tasks, which is similar to Python and C#’s mechanisms but cleaner.

In the simple case a task is similar to a block, and the syntax is also very similar:

def f = fun -> 4; // block returning 4
def t = task -> 4; // task returning 4

Tasks can be invoked the same way blocks can:

f()
> 4
t()
> 4

One difference between tasks and blocks is that tasks can only be invoked once. Or, to be precise, if a task has once finished executing, it is dead and calling it again will raise an error:

t()
> 4
t()
--> error!

Unlike blocks, tasks can yield values as in python and C#. Yielding a value causes the task to be suspended and the value to be returned to the caller. When the task invoked again it will resume execution immediately after where it yielded before:

def ints = task {
var k = 0;
while (true) {
yield k;
k = k + 1;
}
}

ints()
> 0
ints()
> 1
ints()
> 2
...

Finally, you can ask a task whether it has finished executing or if it has more values:

ints.is_done()
> false

And that’s it, that’s a task.

How does this solve the iteration problem? By allowing block-based iteration to be suspended:

def coll = new Vector("Ni!", "Peng!", "Neeee-wom!");
def iter = task {
for (o: coll)
yield o;
};

iter()
> "Ni!"
iter()
> "Peng!"
iter()
> "Neeee-wom!"

This way, Vector doesn’t have to implement iterators; instead, tasks can be used to allow block iteration to be suspended and resumed at will.

I’ve considered letting tasks take arguments. I can’t think of any uses for that but it does seem appropriate somehow, what with tasks and blocks looking so similar:

def get_sym = task (prefix) {
var k = 0;
while (true) {
yield "${prefix}-${k}";
k = k + 1;
}
};

gen_sym("mip");
> "mip-0"
gen_sum("fab")
> "fab-1"

It’s not too difficult to implement, I just feel silly doing so if it’s completely useless. But maybe I or someone else can come up with a neat use, I hope so.

Originally, on the train to Copenhagen, I didn’t think of tasks as tasks but as generators: objects that generate values. And the keyword wasn’t task, it was gen. But on the way back it occurred to me that maybe this is a more general concept than a generator — it’s more like a piece of work that can be paused and resumed during execution, which is where the name task came from. Also, since a task has its own execution state and stack it is really very similar to a thread. With some more work I think it’s possible to design a pretty decent threading model based on this abstraction. That might be the next thing I’ll work on.

I suspect that tasks will play the same kind of role that selector objects did in OSVM smalltalk and neptune: they’re invaluable in one or two key classes that implement some general abstractions, and then the abstractions can be used instead of the “raw” tasks or selectors.

For

Since my last two posts about for expressions I’ve been working on an implementation. Implementing new features in Saturn is pretty easy, by design, but this feature has taken longer than usual for two reasons. First of all, I really only had the syntax when I started so I first had to come up with a decent semantics. Also, implementing it revealed a bunch of bugs in other parts of the system — for instance, it turns out that mixing tail-call optimization and non-local returns gives some interesting problems. But all that has been fixed. Here’s a method that returns the factors of a given integer

factors(k) -> for (x : 2 .. k, k % x == 0) -> x;
// factors(18) = [2, 3, 6, 9]

It’s not particularly efficient but it is definitely succinct. You can also create infinite data structures:

def primes = for (x: Naturals, x > 1, factors(x).is_empty()) -> x;
// primes = [2, 3, 5, 7, ...]
// primes[100] = 541

Of course you can also use it for plain, everyday stuff like

def all_names = for (person: employees) -> person.get_name()

Compare this with, let’s pick a random language, Java:

List all_names = new ArrayList();
for (Person person: employees)
names.add(person.getName());

It’s just not the same…

Now, how does this actually work? The for expression uses a single method on collections: map_filter (I’m not too happy about this name so if you have a better idea feel free to tell me). This method takes two block arguments, map and filter and returns a new collection which contains the results of map applied to the elements in this collection for which filter returns true. A straightforward implementation could be:

map_filter(map, filter) {
def result = new ArrayList();
for (elm: this) {
if (filter(elm))
result.add(map(elm));
}
return result;
}

Note that for statements (for (...) { ... }) are not the same as for expressions (for (...) -> ...).

All a for expression is, really, is calls to the map_filter method. For instance, the body of the factor function above is expanded into

(2 .. k).map_filter(fun x -> x, fun x -> k % x == 0)

Implementing the map_filter method should be pretty straightforward and the good thing is that the implementer is free to decide how to represent the result. I definitely like this approach better than requiring collections to implement several different methods. Scala, for instance, uses three: map, filter and flatMap.

The freedom to choose the representation of the result is the key to implementing the list of primes: the implementation of map_filter in Naturals applies the map and filter lazily:

class Naturals is Collection {

map_filter(map, filter)
-> new MapFilterCollection(this, map, filter);

operator [index] -> index;

for(block) {
var i = 0;
while (true) {
block(i);
i = i + 1;
}
}

}

class MapFilterCollection is Collection {

def collection, map, filter;

// constructor, etc.

map_filter(map, filter)
-> new MapFilterCollection(this, map, filter);

for(block) {
for (obj: collection) {
if (filter(obj))
block(map(obj));
}
}

operator [index] {
var i = 0;
for (obj : this) {
if (index == i) return obj;
i = i + 1;
}
}

}

This is not an efficient implementation of MapFilterCollection but it should give an idea of how things work underneath: the collection just postpones all the work until you iterate through it or ask for a particular element.

Apart from the name of the method, map_filter, I think that’s a pretty decent approach. There’s still one issue left: what happens when you iterate through several collections at once:

for (x: 0 .. 100, y: 0 .. x, z: 0 .. y,
x + y + z == 50) -> new Tuple(x, y, z)

This expression uses three map_filters, one for each variable, but the result has to be turned into a single collection. I have implemented a solution but I think is still needs a bit of work. I’ll write more about that later.

Sequence Comprehensions

Incredible: it turns out that the for expression syntax for list comprehensions I described in my last post is almost identical to the syntax of sequence comprehensions in Scala. Compare this saturn code:

for (x: ints, x % 2 == 0) -> x

with this code in Scala:

for (val x <- ints; x % 2 == 0) yield x

Looks pretty familiar right? I do like the saturn syntax better, though.

Why do I keep having other people’s ideas?

Details

I had originally decided that I wasn’t going to spent a lot of time coming up with new features for saturn but would just reimplement neptune. However, as I might have expected, I just can’t help myself. Many of the “new” ideas are things we discussed back when we designed neptune, though, so it’s more “outtakes” than actual new material.

The thing I’ve been most happy about adding so far is also the very simplest: a compact syntax for creating intervals. In C, Java and even neptune, you iterate over an interval by using the standard for loop:

for (int i = 0; i < n; i++) ...

This is unfortunate for more than one reason. First of all, it's too much boilerplate. 90% of for loops that iterate over a sequence of numbers are the same except for the name of the loop variable and the limit to iterate up to. Below, I've underlined the parts of the statement that aren't boilerplate:

for (int i = 0; i < n; i++) ...

That's actually less than half the code! Also, the loop variable is mentioned three times which gives ample opportunity to get it wrong. If you're like me, for instance, and have the variable i hardwired in your brain but need to iterate for a different variable there's a very real risk that you'll end up writing

for (int j = 0; i < n; i++) ...

The third problem with this kind of for loops is a classic: the limit is evaluated for each iteration. In rare cases that is exactly what you need. More often, however, it is either unnecessary (and may waste execution time) or just plain wrong.

In saturn it takes just one method to fix this: the .. method on the class Integer:

class Integer {
...
operator ..(that) {
return new Interval(this, that);
}
}

With this method, writing from .. to creates a new open interval from from to to. What can you do with an interval? Most importantly, iterate through it:

for (i : 0 .. n) {
// whatever
}

Almost all the boilerplate code is gone. The iteration variable is only mentioned once and the limit is only evaluated once, when the interval is created. And since the receiver of the .. method is known at compile time, the compiler can optimize away the creation of the interval with no visible consequences other than a performance improvement. Yessir!

Another change I've made is to add more shorthands for occurrences of expressions. In neptune we had two syntaxes for blocks: one for statement blocks and one for blocks whose body was just a single expression:

// statement block
fun (x) {
var str = x.to_string();
System.print_on_stdout(str);
}

// expression block
fun (x) -> x * 2

In saturn, I've decided to introduce this distinction more places. Possibly the most visible place I've added this is methods -- there is now a compact syntax for methods that return the value of a single expression:

class Point {
var x, y;
...
norm() -> sqrt(x * x + y * y);
}

In neptune (and Java and ...) you would have had to write something like

norm() {
return sqrt(x * x + y * y);
}

Regular "statement methods" still exist but this makes it easier to write short methods and having short methods is generally a Good Thing. Also, to me, it just sort of looks right in a way I can't explain.

I'm also considering adding a new form of for statement along the same lines:

var ints = for (i : 0 .. 100) -> 2 * i;
// ints = [0, 2, 4, ..., 198]

As with a regular for statement, a for expression evaluates its body repeatedly but the expression collects the values of the body in a new collection. It's essentially a list comprehension

ints = [ i * 2 | i <- [0 .. 100] ]

but with an untraditional syntax that has a number of advantages: it's very easy to explain in terms of for statements and the variable definitions occur syntactically before their use. In the Haskell-like syntax above, i is being used before it's defined which I find non-obvious and difficult to read.

There you have it: three new features, two of which have already been implemented. I haven't figured out all the details of how the new for expression should work but I'm sure I can take inspiration from existing list comprehension mechanisms. I'm also working towards an alpha release of the compiler and runtime but currently you'll have to get it through CVS from saturn-language at sourceforge to try it out.

While

I’m still working on saturn and I recently implemented tail-call optimization. One really neat thing about tail-call optimization is that now, unlike in neptune, loops can be implemented purely within the programming language without any “magic” from the compiler. Okay a little magic is still required to parse the while loop syntax so that

while (i < n) {
System.print_on_stdout(i);
i = i + 1;
}

can be translated into

(fun -> i < n).while(fun {
System.print_on_stdout(i);
i = i + 1;
})

but where the while method was magic in neptune it can be implemented directly in saturn. It turned out, however, that in order for this method to behave as you would expect you have to be careful with how you implement it. My first implementation was pretty straightforward:

while(body) {
if (this()) {
body();
this.while(body);
}
}

The most important property of while, and the reason why it couldn’t be implemented in neptune, is that it must not grow the stack. The way to make this happen is to be sure that the recursive call to while is a tail call which is guaranteed not to grow the stack.

The problem with the method above, or one of the problems, is that in saturn, an if statement is actually a call to the if method. The if statement in the method above corresponds to this:

(this()).if(fun {
body();
this.while(body);
})

Unfortunately, that method call grows the stack. It’s easy to get rid of though, we can just change it into a guard instead:

while(body) {
if (!this()) return;
body();
this.while(body);
}

Now the recursive call to while is not surrounded by any other method calls. However, because of a nasty subtlety the recursive call is still not a tail call.

When is a call a tail call? When the call is the absolutely last thing done in the surrounding method before it returns. It would seem that in the method above the recursive call to while is indeed the very last thing that happens — so it should be a tail call right? The snag is that methods that don’t explicitly return a value implicitly return a dummy value, void. If we could write the method out in full it would correspond to this:

while(body) {
if (!this()) return;
body();
this.while(body);
return void;
}

But as before, it’s easy to get around this. If we return the value of the recursive call instead of letting the method return void the call finally becomes a tail call:

while(body) {
if (!this()) return;
body();
return this.while(body);
}

That is indeed how the method is implemented.

It is unfortunate that the last method call in a method is not a tail call but that would require methods without an explicit return to return the value of the last statement (for some suitable definition) and that brings up endless problems. Returning a void value proved to work very well in neptune so I’m not inclined to change that. Maybe there’s some implementation trick that can allow both things? I suspect there might be.

Anyway, this is one example of tail-call optimization interacting in interesting ways with other features of the language. I’m sure there’ll be more examples along the way.

Saturn!

Maybe it’s because it’s summer and nobody is registering new projects, but the registration of saturn-language at sourceforge came though extremely quickly, in less than a day. And now that I’ve made a new logo it looks like the saturn project is slowly coming alive. Yes, the new logo looks a lot like the old one but still: the ring is broader and now it’s got a few colored bands. Clearly that’s saturn, not neptune.

Someone asked me what the goals of the project are. I don’t have any long-term goals yet but currently the goal is to design the language and get an implementation, any implementation, up and running as quickly as possible. Most of the language design is already done so it’s mainly a question of implementing it. Currently there is a compiler, a simple interpreter written in Java and a very simple class library; all three evolve in parallel. The interpreter is very inefficient but on the other hand it can be changed very quickly to accomodate changes in the language or the compiler. Once the language is more stable and there are plenty of library code (tests, in particular) I will look into implementing a more efficient runtime in C or C++. But it will be some time before the project reaches that point and anyway, I’m more of a compiler/IDE guy than a VM guy so maybe I’ll have to get some more people involved to get a decent VM together. On the other hand, plenty of language get along fine with less-than-decent VMs so we’ll just have to see.

Will this future C or C++ runtime be small enough to run on embedded systems? I’m pretty sure it will. Neptune definitely did, it was designed for it. Saturn makes some extensions that will definitely require more of the runtime: transactional memory, dynamic privacy, full blocks and most likely other things I haven’t even thought of yet. But I believe there are implementation techniques that allows these things to be implemented efficiently in a reasonably small VM. But that’s way way into the future — the project is still less than a month old.

A complete aside: while I was looking for pictures of saturn I found this disturbing painting by Goya, Saturn devours his children.


I don’t even now what is most spooky: the half-eaten son or the mad look in Saturn’s eyes. But then again Goya was apparently half-mad when he painted it, not on a canvans but on his dining room wall.

The story goes that Saturn overthrew his father Uranus and castrated him, no less. Later it was foretold that the same would happen to Saturn himself so to be on the safe side he decided to eat his children at birth. It didn’t do him much good, though, he was just overthrown by someone else: Jupiter. John Keats wrote the (unfinished) poem hyperion about that:

And the sad Goddess weeping at his feet:
Until at length old Saturn lifted up
His faded eyes, and saw his kingdom gone
And all the gloom and sorrow of the place

Incidentally, both the painting and the poem are from 1819. But enough about that.

Super/Inner

The title of my last post was “Saturn?”, which I never really explained. Obviously, Saturn is an alternative name for the new implementation of neptune. The reason why I’ve considered changing the name is both that the language I’m implementing now is somewhat different from neptune and to avoid potential, how shall I put it, “conflict” with Esmertec (not that I expect any problems with Esmertec but just to be on the safe side).

Last week I hadn’t made up my mind about it but now I have: I’m changing the name. Neptune is dead; long live Saturn!

The final straw was the decision I made today to support both super and inner calls. Super calls are a well-known mechanism that exist in most object-oriented languages. Inner calls, a mechanism that originally comes from the beta language, are much less known and completely unknown in mainstream languages.

In most object-oriented languages, overriding a method means replacing it completely. An overriding method can call the overridden method through a super call, but it is entirely up to the implementer of the overriding method if and how it wants to do that.

class Window {
paint() {
...; // paint background
}
}

class BorderWindow: Window {
paint() {
super.paint(); // paint background
...; // draw border
}
}

This works well most of the time since it gives overriding method complete control while it still has the option of calling the overridden method. Occasionally, though, having the subclass in the driver’s seat just isn’t what you want. For instance, sometimes there is no good place to call super in the subclass. One place I’ve met this is when implementing caching in a superclass: the superclass handles caching and subclasses are only called upon to calculate a value if there is a cache miss.

abstract class Node {

get_index() {
if (has_cached_index(this))
return get_cached_index(this);
int index = calculate_index();
cache_index(this, index);
return index;
}

abstract calculate_index();

}

class SomeNode: Node {

override calculate_index() {
...; // expensive calculation
}

}

Here, the superclass handles caching in get_index and the subclasses can specialize how the index is calculated by overriding calculate_index. But this approach is not without problems: it introduces two methods that are deceivingly similar but must be used in two different ways. Only get_index is allowed to call calculate_index directly, since otherwise you won’t get the proper caching behavior. On the other hand, subclasses must override calculate_index, not get_index, but must of course not call calculate_index themselves.

This is where inner comes in. An inner method, as opposed to an overriding method, specializes a method in a superclass but doesn’t replace it. Instead, the method in the superclass can explicitly call the inner method in the subclass:

class A {

print() {
System.out.println("a");
inner.print();
}

}

class B: A {

inner print() {
System.out.println("b");
}

}

new A().print();
> a

new B().print();
> a
> b

Here there are two print methods but when you call print on an instance of B, the method in A will be called, not the one in B. It will only be called when the super implementation calls inner.print(). It’s in some sense overriding turned upside down, with calls downwards in the inheritance hierarchy through inner instead of calls upwards through super. Using this, we can rewrite the caching example from before:

abstract class Node {

get_index() {
if (has_cached_index(this))
return get_cached_index(this);
int index = inner.get_index();
cache_index(this, index);
return index;
}

}

class SomeNode: Node {

inner get_index() {
...; // expensive calculation
}

}

Now there is only one method so there is no confusion about which method to call and which method to override.

As demonstrated above, you don’t need inner as such since you can always just create a new method like calculate_index instead. However, a lot of situations where you would usually use super can be described more elegantly using inner. For instance, if you’ve ever come across a method that required overriding methods to call the super method first (that happens often in eclipse), that’s really just an inner call struggling to get out:

class AbstractPlugin {

/**
* Overriding methods must call super first.
**/
start() {
...; // start plugin
}

}

class MyPlugin: AbstractPlugin {

start() {
super.start();
...; // start my plugin
}

}

With inner methods that could be rewritten as:

class AbstractPlugin {

start() {
...; // start plugin
inner.start();
}

}

class MyPlugin: AbstractPlugin {

inner start() {
...; // start my plugin
}

}

Here, subclasses don’t have to implement start in any particular way, and the fact that the superclass implementation must be executed first is encapsulated in the super implementation of start. Also, the superclass is free to call the inner method however it wants: conditionally, three times, not at all, etc., which is impossible with super calls.

There’s a bit more to inner calls, for instance: what if there is no subclass implementation, how do inner calls interact with super calls, etc. All that is described in the paper I referenced above, Super and Inner – Together at Last!. The model in Saturn is pretty close to the one in the paper (but not exactly the same).

Saturn?

As I mentioned in my last post, I’ve started work on a new implementation of neptune. Even though Esmertec Denmark has been closed down it didn’t mean that the language died, not as such. What it meant was that an implementation of the language died. Now that I have some time off I might as well write a new one.

Will the new implementation be exactly the same as the old one? Not exactly, no. Now that there’s no marketing aspect I’ll take the opportunity to try out a few things. One clear difference is that concurrency control will be based on a transactional memory model. I’ve experimented with changing the syntax a little, for instance allowing -, ?, ! and such in identifiers and method names but in the end I decided against that. There will probably be “real” dynamic privacy. There will definitely be tail call optimization and full closures. But the plan is to otherwise stay close to the original language. There’s a very good reason for that. I know myself: if I’m completely free to design a language I get stuck before I even begin with theoretical issues such as “what is a method name, really?” and “how can I unify instance variables and local variables?”. So to get any work done at all I’ll have to stick with what we already have except for adding a few well-understood mechanisms.

I don’t know if this language will ever get anywhere. I have no world domination plans, so far I’m just doing this because I enjoy implementing programming languages. Of course I do hope that at some point there will be room in the world for a statically typed object-oriented language that is not C++, Java or C#. Otherwise I’m afraid the next few decades of my career will be something of a disappointment.

By the way, if you feel like contributing or just taking a look at how things are going, neplang at sourceforge is open for business.