Plankton #2

As I’ve mentioned earlier I’m in the process of rewriting parts of the neutrino implementation. The first part that had to go was the old binary object format.

Currently, the native compiler is split into two parts. The front-end is written in Java and takes source code in the n₀ language. The n₀ language is a subset of neutrino, the starting point for the full language. The front-end parses and analyzes the program and then generates an intermediate format which is serialized as a platform-independent binary (.pib) file. The backend is written in n₀ and can produce native executable code from a .pib file. Eventually the Java front-end will be replaced with one written in neutrino but for now it’s convenient to keep it in Java.

The original pib file format used a very simple JSON-inspired binary object format. In particular, it was very verbose and could only represent object trees, not full object graphs. This quickly turned out to be limiting and last week I finished replacing it with the plankton #2 format which is more compact and which, most importantly, can represent cyclical object graphs.

In the old representation all variables were fully resolved by the front-end into argument references, local variable references, captured outer variable references, etc. This turned out to be a bad decision because it made life more difficult for the backend. It might want to materialize an argument or captured variable as a local variable, for instance when inlining a function call, which this model made difficult. With the new format where the same object can be referenced from multiple places, variables can point directly to their declaration or, as in the current implementation, the declaration and all uses of the variables can all point to an abstract symbol marker. That way the backend is free to materialize the variables however it wants. This structure comes naturally with an intermediate format that allows general object graphs but requires extra effort to represent if you can only create object trees.

This also allows anonymous objects. For instance, some instances currently have a named protocol (called implicit-N for some N). With the new structure they can simply point to their anonymous protocol without the indirection of a unique name.

This change was quite tedious to implement but will enable a lot of simplifications and improvements.