There is one language design problem I've never been able to solve to my own satisfaction, and which no language I know solves completely: constructing objects. Gilad brought the subject up in two blog posts a while back but while the newspeak solution he describes has some nice properties it just doesn't feel quite... right. Now, I'm currently (as always) working on a hobby language and had to find some, any, solution to this problem. While working on that I came up with a slightly weird solution that has some nice properties. But before going into that this warm-up post takes a look at the newspeak solution. Note: because I know little of newspeak I may be making some wrong assumptions about how the language works; apologies in advance if I get things wrong.

Update: Peter has written an update that demonstrates that I did indeed get things wrong, newspeak constructors are not as restricted as I assume in this post. So you may want to just skip over the newspeak part.

I'll warm up by rehashing a well-known problem with the most widely used constructor mechanisms: they are just not object oriented. In Java, if you write

new C(...)

you've tied your code directly to the class C because the constructor is forced to return a new instance of C. Not a subclass, not a cached instance, not anything else. There is no abstraction there, no wiggle room.

One way to make things slightly better is to use a static factory method instead, that frees you from some of the restrictions of constructors so you're no longer tied to a particular implementation, but it still ties you to a particular factory. You can then take a step further and implement factory objects that can be passed around as arguments and changed dynamically but that gives you a lot of boilerplate code. A much more lightweight solution in languages like SmallTalk that support it, is to use class methods on first-order class object -- it's really just factory objects but the language happens to give them to you for free. This is a pretty decent solution but has the disadvantage that it requires first-order class objects. A class object is a reflective, meta-level, thing which people shouldn't have to use in programs that are otherwise non-reflective. But that is a small detail and probably not a problem in practice.

However, the issue of how best to invoke constructors is only part of the problem with object creation and not the hard part. The hard part is: how do you initialize instances once the constructor has been called; how do you set the values of an object's instance variables.

What makes this especially hard is the fact that instance variables are essentially non-object-oriented. There are ways of making them less bad, like hiding them and providing accessor methods, as in Self and (apparently) newspeak. That gives you some wiggle room, for instance by allowing you to change something that was an instance variable into a computed value or overriding the accessor, without breaking client code. However, even with Self-style accessors, we're still not off the hook because instance variables have to be initialized. This is where, in my opinion, the newspeak model breaks down.

Consider this newspeak example:

class Point2D x: i y: j = ( |
public x ::= i.
public y ::= j.
| ) (
public printString = (
ˆ ’ x = ’, x printString, ’ y = ’, y printString
)
)

There is a 1:1 correspondence between arguments to the primary constructor and instance variables in the object. And it must be so, except for trivial variations like reorderings, since any nontrivial initialization of instance variables would introduce the possibility of exceptions or non-local returns and hence void the guarantee of correct initialization[1].

Imagine that we want to give each point a unique integer id. In Java you could easily implement it something like this (ignoring synchronization issues):

class Point2D {
static int currentId = 0;
int x, y, id;
Point2D(int x, int y) {
this.x = x;
this.y = y;
this.id = currentId++;
}
}

In newspeak you can use a secondary constructor to implement this but as soon as you try to subclass Point2D you're in trouble: the subclass has to call the primary constructor, which is only allowed to do trivial computation before setting the instance variables. So either the id generation code has to be duplicated in a secondary constructor for the subclass, or the id slot has to stay uninitialized and then be initialized in a second pass. The first option is obviously bad and the second pass erodes the guarantee of correct initialization. At least, that's how I understand the mechanism.

The root of the problem, and this also applies to Java constructors, is that steps of object construction that logically belong together are executed in the wrong order. A simplified model of how the construction process takes place is that there are three steps:

  1. Preprocess: calculate the values to be stored in the object, for instance the point's id
  2. Initialize: store the calculated values in instance variables. Only trivial field stores take place here, no general computation
  3. Postprocess: possibly additional work after the object is fully initialized.

If we consider the simple Point3D inheritance hierarchy where

Object <: Point2D <: Point3D

the newspeak model allows the following sequence of steps:

  • Arbitrary preprocessing in secondary constructor in Point3D
  • Instance creation by primary constructor of Point3D, recursively through primary constructor of Point2D
  • Initialization of Point2D slots
  • Initialization of Point3D slots
  • Arbitrary preprocessing in secondary Point3D constructor

Since no arbitrary computation is allowed from the object has been created to the whole chain has had a chance to initialize their variables the object is guaranteed to have been initialized. However, only the bottom-most object on the inheritance chain is allowed to perform pre- and postprocessing.

In Java the model is different: there, each constructor through the chain is allowed to do arbitrary computations, as long as it just starts off calling its superconstructor. Expressed in the simple steps from before, Java allows this sequence:

  • Immediate instance creation
  • Arbitrary preprocessing in Point2D
  • Initialization of Point2D
  • Arbitrary postprocessing in Point2D
  • Arbitrary preprocessing in Point3D
  • Initialization of Point3D
  • Arbitrary postprocessing in Point3D

Object initialization and arbitrary computations are interspersed which gives plenty of opportunity to leave the object half-initialized if an exception occurs and for superclasses to accidentally call methods overridden by subclasses before the subclass has had a chance to properly initialize itself.

The ideal model would be one that looked something like this sequence of steps:

  • Arbitrary preprocessing in Point3D
  • Arbitrary preprocessing in Point2D
  • Object instantiation
  • Initialization of Point2D
  • Initialization of Point3D
  • Arbitrary postprocessing in Point2D
  • Arbitrary postprocessing in Point3D

Here, all classes are allowed to do arbitrary pre- and postprocessing but the actual instantiation and initialization is atomic so the object is guaranteed to be well-formed. Finally, once the object has been initialized, each inheritance level can do postprocessing with a fully initialized object.

My next post will explain how I think that can be made to work.


[1]I assume that newspeak constructors guarantee that objects are always correctly initialized because of this paragraph in Gilad's post: "One detail that’s new here is the superclass clause: Point3D inherits from Point2D, and calls Point2D’s primary constructor. This is a requirement, enforced dynamically at instance creation time. It helps ensure that an object is always completely initialized."