Constructors #2

In my last post I wrote about the problems with constructors. In this post I'll describe the approach I'm currently considering using for my hobby language. But first I have to set things up a bit and describe some related language constructs.

Protocols

First of all, there are no classes but only protocols. You can think of protocols as classes without fields or, equivalently, as interfaces where methods can have implementations or, most accurately, as something very similar to traits. Here is an example:

protocol Point2D {
  def x();
  def y();
  def to_string() {
    return "a Point (x: ${this.x()}, y: ${this.y()})";
  }
}

In a class-based language Point2D would have had two instance variables, x and y. Since protocols don't have fields this one has two virtual methods instead where the fields should be, x() and y(). It also has a to_string method which uses string interpolation, which I've stolen directly from the neptune language.

One way to create an instance of Point2D is to create a subprotocol that defines the methods x and y:

protocol Point2DAt3Comma4 : Point2D {
  def x() { return 3; }
  def y() { return 4; }
}

def p := new Point2DAt3Comma4()¹();
print(p.to_string()); // -> a Point(x: 3, y: 4)

Even though protocols are like interfaces there is nothing to stop you from creating instances of them. And because they don't have any fields, initialization is trivial.

Records

However, it obviously doesn't work that you have to write a named subprotocol whenever you want a concrete point. Instead, there is a dual to protocols: records. Where a protocol is pure behavior and no state, a record is pure state and no behavior. In the case of Point2D the associated record could be

def p := new { x: 3, y: 4 };
print(p.x()); // -> 3
print(p.y()); // -> 4

All a record can do is return the field value when the associated method is called. As with protocols, initializing a record cannot fail because it is only created after all the field values have been evaluated.

So now that we have behavior without state and state without behavior we just have to combine them. This is done using the "extended" method call syntax:

def p := new Point2D() { x: 3, y: 4 }
print(p.to_string()); // -> a Point(x: 3, y: 4)

This syntax creates a record like the previous example, but make the record an instance of Point2D. Initialization is trivial, as with plain records, because all field values have been evaluated before the instance is created. This works in the simple case but unfortunately doesn't solve the general problem. First of all, we've exposed the "fields" of Point2D. If we later want to change to a polar representation, every occurrence of the above syntax that binds x and y directly will give us broken points. Also, this does not take inheritance into account.

For this reason, the syntax above is not intended to be used in client code, only in Point2D's methods. Instead, Point2D should define factory methods for client code to call:

protocol Point2D {
  ...
  static factory def new(x: i, y: j) {
    return new this() { x: i, y: j };
  }
}

The static modifier means that the method is the equivalent of a smalltalk class method. x: and y: are python-style keywords with a more smalltalk-like syntax. The factory keyword does not have any effect until we get to subprotocols. In any case, with this definition you can now create a new point by writing

def p := new Point(x: 3, y: 4);

If, at some later point, you decide to switch to a polar representation you simply have to change the definition of new(x:, y:):

static factory def new(x: i, y: j) {
   def t := arc_cos(i);
   return new this() { rho: j / sin(t), theta: t };
}

As long as people use the factory methods you are free to change the implementation of the protocol. Also, the protocol can do arbitrary pre- and postprocessing and the actual instantiation and initialization is atomic. All that's left now is to account for subprotocols. Enter Point3D

Inheritance

Here is Point3D:

protocol Point3D : Point2D {
  def z();
  override def to_string() {
    return "a Point (x: ${this.x()}, y: ${this.y()}, z: ${this.z()})";
  }
}

This protocol has three "fields", x, y and z that have to be initialized. One way to do it would be for Point3D to initialize all of them:

protocol Point3D : Point2D {
  ...
  static factory def new(x: i, y: j, z: k) {
    return new this() { x: i, y: j, z: k };
  }
}

This sort of works but it is error-prone, breaks encapsulation, and breaks down if we were to change Point2D to use polar coordinates. What we really want is for Point3D to call the new(x:, y:) method on Point2D. That's where the factory keyword from before comes in. A factory method is one that allows you to specify a set of fields. Because new(x:, y:) is a factory method the instance created by this call

new this() { x: i, y: j }

will not only have an x and y field but also any fields specified by the caller of new(x:, y:). This allows us to rewrite new(x:, y:, z:) as

static factory def new(x: i, y: j, z: k) {
  return new super(x: i, y: j) { z: k };
}

If Point2D were to change its representation to polar coordinates then Point3D would still work, provided that Point2D had x and y methods. The result would just have a rho, theta and z field.

Let's look at another example: the one from the previous post where points have a serial number.

protocol Point2D {
  def x();
  def y();
  def id();
  static factory def new(x: i, y: j) {
    def s := current_id++;
    return new this() { x: i, y: j, id: s };
  }
}

protocol Point3D : Point2D {
  def z();
  static factory def new(x: i, y: j, z: k) {
    return new super(x: i, y: j) { z: k };
  }
}

Even though this requires nontrivial computation in a superprotocol there is no risk of creating improperly initialized objects. In terms of the construction steps described in the last post this allows initialization to proceed like this:

Arbitrary preprocessing in Point3D
Arbitrary preprocessing in Point2D
Object instantiation and full initialization
Arbitrary postprocessing in Point2D
Arbitrary postprocessing in Point3D

The fact that protocols don't declare fields but that fields are provided upon construction also allows new kinds of specialization. For instance, we don't actually have to change the implementation of Point2D to create an instance that uses polar coordinates:

protocol Polar2D : Point2D {
  def rho();
  def theta();
  def x() { return this.rho() * cos(this.theta()); }
  def x() { return this.rho() * sin(this.theta()); }
  static factory def new(rho: r, theta: t) {
    return new super() { rho: r, theta: t };
  }
}

In this case we choose to bypass the new(x:, y:) method and instead provide a method implementation of those fields instead of instance variables. This way we can share all methods in Point2D. This is also possible in other languages if all access to x and y takes place through accessors, but there you would get two unused fields in all instances.

Finally, let me just mention that any method, not just new, can be a factory; the syntax is just slightly different when you invoke it; for instance

def q := Point2D.at(x: 3, y: 4) { z: 5 };

It looks slightly different but the mechanism is exactly the same.

¹ The meaning of new syntax, as in neptune, is not a special construct but an ordinary method call. The syntax new X() is exactly equivalent to the ordinary method call X.new().