Monthly Archives: July 2006

While

I’m still working on saturn and I recently implemented tail-call optimization. One really neat thing about tail-call optimization is that now, unlike in neptune, loops can be implemented purely within the programming language without any “magic” from the compiler. Okay a little magic is still required to parse the while loop syntax so that

while (i < n) {
System.print_on_stdout(i);
i = i + 1;
}

can be translated into

(fun -> i < n).while(fun {
System.print_on_stdout(i);
i = i + 1;
})

but where the while method was magic in neptune it can be implemented directly in saturn. It turned out, however, that in order for this method to behave as you would expect you have to be careful with how you implement it. My first implementation was pretty straightforward:

while(body) {
if (this()) {
body();
this.while(body);
}
}

The most important property of while, and the reason why it couldn’t be implemented in neptune, is that it must not grow the stack. The way to make this happen is to be sure that the recursive call to while is a tail call which is guaranteed not to grow the stack.

The problem with the method above, or one of the problems, is that in saturn, an if statement is actually a call to the if method. The if statement in the method above corresponds to this:

(this()).if(fun {
body();
this.while(body);
})

Unfortunately, that method call grows the stack. It’s easy to get rid of though, we can just change it into a guard instead:

while(body) {
if (!this()) return;
body();
this.while(body);
}

Now the recursive call to while is not surrounded by any other method calls. However, because of a nasty subtlety the recursive call is still not a tail call.

When is a call a tail call? When the call is the absolutely last thing done in the surrounding method before it returns. It would seem that in the method above the recursive call to while is indeed the very last thing that happens — so it should be a tail call right? The snag is that methods that don’t explicitly return a value implicitly return a dummy value, void. If we could write the method out in full it would correspond to this:

while(body) {
if (!this()) return;
body();
this.while(body);
return void;
}

But as before, it’s easy to get around this. If we return the value of the recursive call instead of letting the method return void the call finally becomes a tail call:

while(body) {
if (!this()) return;
body();
return this.while(body);
}

That is indeed how the method is implemented.

It is unfortunate that the last method call in a method is not a tail call but that would require methods without an explicit return to return the value of the last statement (for some suitable definition) and that brings up endless problems. Returning a void value proved to work very well in neptune so I’m not inclined to change that. Maybe there’s some implementation trick that can allow both things? I suspect there might be.

Anyway, this is one example of tail-call optimization interacting in interesting ways with other features of the language. I’m sure there’ll be more examples along the way.

Saturn!

Maybe it’s because it’s summer and nobody is registering new projects, but the registration of saturn-language at sourceforge came though extremely quickly, in less than a day. And now that I’ve made a new logo it looks like the saturn project is slowly coming alive. Yes, the new logo looks a lot like the old one but still: the ring is broader and now it’s got a few colored bands. Clearly that’s saturn, not neptune.

Someone asked me what the goals of the project are. I don’t have any long-term goals yet but currently the goal is to design the language and get an implementation, any implementation, up and running as quickly as possible. Most of the language design is already done so it’s mainly a question of implementing it. Currently there is a compiler, a simple interpreter written in Java and a very simple class library; all three evolve in parallel. The interpreter is very inefficient but on the other hand it can be changed very quickly to accomodate changes in the language or the compiler. Once the language is more stable and there are plenty of library code (tests, in particular) I will look into implementing a more efficient runtime in C or C++. But it will be some time before the project reaches that point and anyway, I’m more of a compiler/IDE guy than a VM guy so maybe I’ll have to get some more people involved to get a decent VM together. On the other hand, plenty of language get along fine with less-than-decent VMs so we’ll just have to see.

Will this future C or C++ runtime be small enough to run on embedded systems? I’m pretty sure it will. Neptune definitely did, it was designed for it. Saturn makes some extensions that will definitely require more of the runtime: transactional memory, dynamic privacy, full blocks and most likely other things I haven’t even thought of yet. But I believe there are implementation techniques that allows these things to be implemented efficiently in a reasonably small VM. But that’s way way into the future — the project is still less than a month old.

A complete aside: while I was looking for pictures of saturn I found this disturbing painting by Goya, Saturn devours his children.


I don’t even now what is most spooky: the half-eaten son or the mad look in Saturn’s eyes. But then again Goya was apparently half-mad when he painted it, not on a canvans but on his dining room wall.

The story goes that Saturn overthrew his father Uranus and castrated him, no less. Later it was foretold that the same would happen to Saturn himself so to be on the safe side he decided to eat his children at birth. It didn’t do him much good, though, he was just overthrown by someone else: Jupiter. John Keats wrote the (unfinished) poem hyperion about that:

And the sad Goddess weeping at his feet:
Until at length old Saturn lifted up
His faded eyes, and saw his kingdom gone
And all the gloom and sorrow of the place

Incidentally, both the painting and the poem are from 1819. But enough about that.

Super/Inner

The title of my last post was “Saturn?”, which I never really explained. Obviously, Saturn is an alternative name for the new implementation of neptune. The reason why I’ve considered changing the name is both that the language I’m implementing now is somewhat different from neptune and to avoid potential, how shall I put it, “conflict” with Esmertec (not that I expect any problems with Esmertec but just to be on the safe side).

Last week I hadn’t made up my mind about it but now I have: I’m changing the name. Neptune is dead; long live Saturn!

The final straw was the decision I made today to support both super and inner calls. Super calls are a well-known mechanism that exist in most object-oriented languages. Inner calls, a mechanism that originally comes from the beta language, are much less known and completely unknown in mainstream languages.

In most object-oriented languages, overriding a method means replacing it completely. An overriding method can call the overridden method through a super call, but it is entirely up to the implementer of the overriding method if and how it wants to do that.

class Window {
paint() {
...; // paint background
}
}

class BorderWindow: Window {
paint() {
super.paint(); // paint background
...; // draw border
}
}

This works well most of the time since it gives overriding method complete control while it still has the option of calling the overridden method. Occasionally, though, having the subclass in the driver’s seat just isn’t what you want. For instance, sometimes there is no good place to call super in the subclass. One place I’ve met this is when implementing caching in a superclass: the superclass handles caching and subclasses are only called upon to calculate a value if there is a cache miss.

abstract class Node {

get_index() {
if (has_cached_index(this))
return get_cached_index(this);
int index = calculate_index();
cache_index(this, index);
return index;
}

abstract calculate_index();

}

class SomeNode: Node {

override calculate_index() {
...; // expensive calculation
}

}

Here, the superclass handles caching in get_index and the subclasses can specialize how the index is calculated by overriding calculate_index. But this approach is not without problems: it introduces two methods that are deceivingly similar but must be used in two different ways. Only get_index is allowed to call calculate_index directly, since otherwise you won’t get the proper caching behavior. On the other hand, subclasses must override calculate_index, not get_index, but must of course not call calculate_index themselves.

This is where inner comes in. An inner method, as opposed to an overriding method, specializes a method in a superclass but doesn’t replace it. Instead, the method in the superclass can explicitly call the inner method in the subclass:

class A {

print() {
System.out.println("a");
inner.print();
}

}

class B: A {

inner print() {
System.out.println("b");
}

}

new A().print();
> a

new B().print();
> a
> b

Here there are two print methods but when you call print on an instance of B, the method in A will be called, not the one in B. It will only be called when the super implementation calls inner.print(). It’s in some sense overriding turned upside down, with calls downwards in the inheritance hierarchy through inner instead of calls upwards through super. Using this, we can rewrite the caching example from before:

abstract class Node {

get_index() {
if (has_cached_index(this))
return get_cached_index(this);
int index = inner.get_index();
cache_index(this, index);
return index;
}

}

class SomeNode: Node {

inner get_index() {
...; // expensive calculation
}

}

Now there is only one method so there is no confusion about which method to call and which method to override.

As demonstrated above, you don’t need inner as such since you can always just create a new method like calculate_index instead. However, a lot of situations where you would usually use super can be described more elegantly using inner. For instance, if you’ve ever come across a method that required overriding methods to call the super method first (that happens often in eclipse), that’s really just an inner call struggling to get out:

class AbstractPlugin {

/**
* Overriding methods must call super first.
**/
start() {
...; // start plugin
}

}

class MyPlugin: AbstractPlugin {

start() {
super.start();
...; // start my plugin
}

}

With inner methods that could be rewritten as:

class AbstractPlugin {

start() {
...; // start plugin
inner.start();
}

}

class MyPlugin: AbstractPlugin {

inner start() {
...; // start my plugin
}

}

Here, subclasses don’t have to implement start in any particular way, and the fact that the superclass implementation must be executed first is encapsulated in the super implementation of start. Also, the superclass is free to call the inner method however it wants: conditionally, three times, not at all, etc., which is impossible with super calls.

There’s a bit more to inner calls, for instance: what if there is no subclass implementation, how do inner calls interact with super calls, etc. All that is described in the paper I referenced above, Super and Inner – Together at Last!. The model in Saturn is pretty close to the one in the paper (but not exactly the same).

Saturn?

As I mentioned in my last post, I’ve started work on a new implementation of neptune. Even though Esmertec Denmark has been closed down it didn’t mean that the language died, not as such. What it meant was that an implementation of the language died. Now that I have some time off I might as well write a new one.

Will the new implementation be exactly the same as the old one? Not exactly, no. Now that there’s no marketing aspect I’ll take the opportunity to try out a few things. One clear difference is that concurrency control will be based on a transactional memory model. I’ve experimented with changing the syntax a little, for instance allowing -, ?, ! and such in identifiers and method names but in the end I decided against that. There will probably be “real” dynamic privacy. There will definitely be tail call optimization and full closures. But the plan is to otherwise stay close to the original language. There’s a very good reason for that. I know myself: if I’m completely free to design a language I get stuck before I even begin with theoretical issues such as “what is a method name, really?” and “how can I unify instance variables and local variables?”. So to get any work done at all I’ll have to stick with what we already have except for adding a few well-understood mechanisms.

I don’t know if this language will ever get anywhere. I have no world domination plans, so far I’m just doing this because I enjoy implementing programming languages. Of course I do hope that at some point there will be room in the world for a statically typed object-oriented language that is not C++, Java or C#. Otherwise I’m afraid the next few decades of my career will be something of a disappointment.

By the way, if you feel like contributing or just taking a look at how things are going, neplang at sourceforge is open for business.

Back

I’m back after a week of sailing in Sweden. Both the experience of sailing and the experience of going to Sweden depend a lot on the weather. This year, the weather was a bit less than optimal. Oh well, at least that gave me a chance to finally read some books I’ve had lying around. The stars’ tennis balls by Stephen Fry deservers honorable mention. Guaranteed to be read in one sitting it said on the cover. I promised myself to just read a chapter or two and save the rest for the trip home. Well, apparently book covers know me better than I do. So on the way home I just stared out the window.

Now that I’m back I plan to spend some time working on a project I started a few weeks ago. It’s a new implementation of a programming language that readers of this blog may be familiar with. I’ll write more about what my plans are when I know myself, right now I’m just typing in code.

RIP

Esmertec sent out an innocent-looking press release today: Jean-Claude Martinez Confirmed as CEO of Esmertec. But there’s a bit more to it than the title suggests:

Cost reductions: Management has decided to close down Esmertec’s subsidiary in Denmark and reduce significantly the operations in Japan. The company has assigned resources from other Esmertec’s offices and in Japan to continue supporting the products and customers in these regions.

What that means is that I now have a really long summer vacation. It also means that you shouldn’t expect an open source version of neptune to be released any time soon — in fact, don’t expect any version to be released at all.

Why has this happened? This press release gives a hint.

Types #2

When I wrote the first post about types in neptune I ran out of steam before I was completely finished. In this post I’ll finish by describing two features I only mentioned briefly in the first post, the void type and protocol literals.

Checks

But first: as Isaac points out, I’ve spent a lot of time explaining what the type system doesn’t do and doesn’t require, and very little explaining what it actually does do for you. I guess I didn’t think to write about it because the checks that are performed are pretty simple and conventional. Here’s a summary of the checks performed by the compiler.

  • If you call a method m and the type system knows that the receiver has a type that doesn’t define the method m, the checker will issue a warning:
    protocol Runnable {
    run();
    }

    Runnable action = ...;
    action.runn(); // warning: undefined method Runnable.runn()
  • If the type checker knows the signature of a method and you pass it an argument with an incompatible type, a warning will be issued:
    int gcd(int a, int b) { ... }
    gcd("foo", "bar"); // warning: incompatible types

    I’ll return to what “incompatible” means in a second.

  • If a value is assigned to a variable of type T, a warning is issued if the type of that value is not a subtype of T:
    int x = "4"; // warning: incompatible assignment
  • When returning a value from a method with return type T, the returned value must be a subtype of T.
    int foo() {
    return "bar"; // warning: incompatible return type
    }

    As a special case of this rule which I’ll return to later, returning a value from a void method is illegal, and not returning a value from a non-void method is illegal.

  • Finally, the compiler checks that methods override each other properly. An overriding method is allowed to return a subtype of the overridden method (return type covariance) and it is allowed to take arguments that are supertypes of the arguments of the overridden method (argument contravariance):
    class SuperClass {
    Object foo(int x) { ... }
    }

    class SubClass: SuperClass {
    Object foo(String x) { ... } // incompatible override
    int foo(Object x) { ... } // legal
    }

There may be more checks that I’ve forgotten but at least these are the most important ones. All but one of these checks use the subtype relation between two types which is simple in neptune: The type X is a subtype of Y they are equal, if Y is a class that extends X, directly or indirectly, or if Y implements the protocol of X, again directly or indirectly. It’s a simple version of the nominal typing relation of many other object-oriented languages, including Java and C#, except that it is more general because one class can be a subtype another class without having that class as a superclass.

Besides these checks, the compiler also issues warnings if it thinks a class has not been properly implemented: if non-abstract classes have abstract methods, if a class doesn’t provide all the methods required by the traits it uses, if it declares that a method overrides another but it doesn’t, etc. However, all those things could be checked even if there was no static type system.

I hope that clarifies what the type system will do for you, and then move on to describing the two last features of the type system.

Void

In most languages, methods that don’t return a meaningful value are treated differently than other methods. In the C language family, such functions and methods are declared as if they did return a value of type void, but the void type is special and magical; what it really means is that the function don’t return a value at all. In particular, there is no value of type void and you can’t declare a variable of that type:

void do_nothing() { }

void my_void = do_nothing(); // illegal

Void methods and functions in the C languages are fundamentally different from functions and methods that return values and the type system keeps track of whether or not you’re allowed to use the result of a call. In neptune we can’t do that because you are free to call a method even if the type system knows nothing whatsoever about it. So we couldn’t use the C model, at least not directly.

In smalltalk, on the other hand, there is no such thing as a method that doesn’t return a value. You are free to not explicitly return a value from a method, but in that case the method returns self. I don’t consider that to be a particularly good design and long before we considered switching from smalltalk to another language we experimented with changing this in OSVM. The problem is that many methods really aren’t intended to return a meaningful value but return self by default. Then someone accidentally uses this value and then later when the method is changed, under the assumption that no one is using the return value, the program breaks somewhere completely unexpected. The problem is that the intention of the method isn’t clear because sometimes methods really do intend to return self but you can’t tell if this is the case since the return isn’t written explicitly. Personally, I used the convention that if I wanted a method to return a value I would always explicitly return it, even self. If I didn’t want the method to return a meaningful value, I would return nil.

In neptune, we wanted a combo solution: all methods should return a value but we also wanted to avoid the problems from smalltalk and we wanted the code to look like C. The solution was to make void into a real type and introduce a void value. All methods that don’t explicitly return a value automatically return the void value, and a return statement without an explicit value also returns the void value. The void value itself is a full, real object just like null:

void get_void_value() {
return;
}

var void_value = get_void_value();
System.out.println(void_value.to_string()); // prints "void"

This model gives you almost the same behavior as C, Java and co. but for different reasons. If you declare that a method returns void the type system will warn you if you return a value of another type, since the returned value is not a subtype of void. If a method is declared to return something other than void but don’t return a value, the type check fails because a method that doesn’t explicitly return is made to return the void value which is not a subtype of the return type. Finally, if the void value ever turns up as the value of a variable or somewhere else where you don’t expect it you’ll know that there is a problem.

This model does give a little more flexibility, some of it is probably useless (but you never know) and some of it is slightly useful. First of all, since void is a regular type you can declare variables of type void. I don’t know why you would do that but you can. Secondly, it is legal to explicitly return a value from a void method if that value is the void value. That might not sound useful but it can be since it means that you can call another method and return in the same statement:

void add(var obj) {
if (delegate) return target.add(obj);
...
}

In java, you can’t call a void method and return in the same statement so you would have to write

void add(Object obj) {
if (delegate) {
target.add(obj);
return;
}
...
}

It’s a small thing but it does give greater uniformity. By the way, I believe you can actually do the same thing in C++.

Of course, having a dynamic representation of the “nothing” value is not a new thing, many languages have that. My experience with it, at least in the context of this kind of language, is that it works very well in practice and better than any of the alternatives I’ve tried.

Protocol literals

The last thing is sort of an oddball. Sometimes it can be practical to represent a type as a run-time value. For instance, in our unit test framework we have a method that asserts that running a particular piece of code throws a particular exception

assert_throws(IOException.protocol, fun -> file.open());

The expression IOException.protocol gives you an object that represents the protocol IOException. A protocol object can be used for exactly one thing: to test whether other objects implement the protocol it represents:

Protocol p = String.protocol;
var x = "Beige";
if (p.is_supported_by(x)) ...

That may not sound very exciting but it can be useful in combination with other abstractions that are built on protocol objects, for instance exception handling. In a post a long time ago I mentioned that in neptune, an exception handler such as this:

try {
...
} catch (e: IOException) {
...
}

is expanded into this:

fun {
...
}.try_catch(
new Vector { IOException.protocol },
fun (Exception e) {
...
}
)

Exception handling is based on the try_catch method which uses a list of protocol literals to decide whether or not it catches a particular exception. The fact that you can call this method directly means that you can make exception handlers that are not statically bound to a particular exception type but where it can be determined dynamically which exceptions to catch. This is how assert_throws is implemented:


void assert_throws(Protocol type, fun f) {
fun {
f();
fail_not_thrown(type);
}.try_catch(new Vector { type }, fun {
// Ignore exceptions of the right type
});
}

It’s not something you use often but on those rare occasions where you do use it it’s really handy.