Monthly Archives: January 2011

Status, end of January

As mentioned in my christmas status I’m currently revisiting some of the more short-sighted decisions I made in the design of the implementation. I’m not quite there yet but I’m getting close.

I’ve already replaced the binary object format. That was step one.

Step two was to change the model for mutable state to match the new model which I’ve described partially in the post about constructors, and which I’ll expand on in a future blogpost I’m almost done with. That turned out to actually be a relatively small change, which is now also done.

The third step is the biggest one, but the one that initiated the whole reimplementation: changing from a bytecode-based intermediate representation to one that is AST based. I originally used combined bytecode/syntax-tree approach but it turned out to be too difficult to work with and in particular too difficult to transform. I’ve taken a gradual approach to reimplementing this, keeping the old format but gradually replacing more and more with the new approach. At this point I only have one construct left to implement, then the Java implementation of neutrino can stop using the old format completely. I’m so looking forward to getting rid of the code that implements the old format and just finally being able to do a complete spring cleaning of the Java compiler.

Along the way I decided that, since the “continuation” captured by a with_1cc can’t properly be said to be a one-shot continuation, with_escape would be a better name. All you’re doing is setting up a way to escape execution of an expression, the construct might as well say that directly.

One interesting thing I’ve noticed during this is the fact that while I change everything: the language syntax, parts of the semantics, the file formats and large parts of the implementation, one thing never changes: the tests. The text of the tests sometimes change as the language changes, but their effects and the fact that they all pass stays the same. I find that interesting and it underscores the fact that the bedrock on which everything else builds are those tests.

Plankton #2

As I’ve mentioned earlier I’m in the process of rewriting parts of the neutrino implementation. The first part that had to go was the old binary object format.

Currently, the native compiler is split into two parts. The front-end is written in Java and takes source code in the n0 language. The n0 language is a subset of neutrino, the starting point for the full language. The front-end parses and analyzes the program and then generates an intermediate format which is serialized as a platform-independent binary (.pib) file. The backend is written in n0 and can produce native executable code from a .pib file. Eventually the Java front-end will be replaced with one written in neutrino but for now it’s convenient to keep it in Java.

The original pib file format used a very simple JSON-inspired binary object format. In particular, it was very verbose and could only represent object trees, not full object graphs. This quickly turned out to be limiting and last week I finished replacing it with the plankton #2 format which is more compact and which, most importantly, can represent cyclical object graphs.

In the old representation all variables were fully resolved by the front-end into argument references, local variable references, captured outer variable references, etc. This turned out to be a bad decision because it made life more difficult for the backend. It might want to materialize an argument or captured variable as a local variable, for instance when inlining a function call, which this model made difficult. With the new format where the same object can be referenced from multiple places, variables can point directly to their declaration or, as in the current implementation, the declaration and all uses of the variables can all point to an abstract symbol marker. That way the backend is free to materialize the variables however it wants. This structure comes naturally with an intermediate format that allows general object graphs but requires extra effort to represent if you can only create object trees.

This also allows anonymous objects. For instance, some instances currently have a named protocol (called implicit-N for some N). With the new structure they can simply point to their anonymous protocol without the indirection of a unique name.

This change was quite tedious to implement but will enable a lot of simplifications and improvements.