Match

When I first heard about Scala, the very first thing I did was find the specification and look up the details of pattern matching. I'm a big fan of pattern matching, something I apparently share with Martin Odersky, and I wanted to see if Scala used the same model as his previous language, Pizza. Unfortunately it does as far as I can see. The reason I think it's unfortunate is that I think there is a great potential in introducing pattern matching in an object-oriented language but I don't think Scala's model quite reaches that potential.

Before getting to Scala's mechanism I'll start off by explaining why I think pattern matching is something you would want to have in an object-oriented language. Conventional wisdom says that in an object-oriented language you're not supposed to decide which action to take based on the type of other objects. Scott Meyers says in Effective C++:

Anytime you find yourself writing code of the form "if the object is of type T1, then do something, but if it's of type T2, then do something else," slap yourself.

Instead of dispatching on an object's type, you should use polymorphism. This rule is a good first approximation, but it's not the whole story. Sometimes you're not in a position to add methods to other objects. And even if you are, adding methods can be a bad idea because it often means polluting other objects with methods that may be convenient for you but are completely irrelevant to the object itself and anyone else who might use it. Case in point: in Squeak the Object class has over 400 methods including isSketchMorph, playSoundNamed and, the winner by unanimous decision, hasModelYellowButtonMenuItems. Encapsulation? We don't need no steenkin' encapsulation!

This is where a pattern matching mechanism can be really useful. If you have a reasonably fixed set of object types that you want to operate on it gives you a structured, simple and readable way of distinguishing between objects. Unfortunately Scala's model, and Pizza's before it, imposes some limitations on the objects involved that make it unnecessarily difficult to use.

Scala's pattern matching mechanism is pretty simple: you can declare a class as a case class, which enables you to match against it:

class Expr
case class Var(x: String) extends Expr
case class Apply(f: Expr, e: Expr) extends Expr
case class Lambda(x: String, e: Expr) extends Expr

Given an Expr object, or any object really, you can match it against these three cases like this:

val e: Expr = ...
e match {
  case Var(x) => ...
  case Apply(f, e) => ...
  case Lambda(x, e) => ...
}

This looks very similar to functional-style pattern matching as in ML and Haskell.

The most important problem with this mechanism is that a case is bound to a particular class. The Var(x) case only matches instances of the Var class and its subclasses. It's fine to dispatch based on whether or not an object implements a particular interface, because any class is free to declare that it implements that interface, and implement it however they want. Requiring classes to have a particular superclass, on the other hand, dictates how the class should be implemented which is a bad thing indeed.

Here's an example of why that's a problem. Let's say I'm implementing a linear algebra library with various functions that act on points, lists, matrices, etc. I have a basic Point class:

case class Point(x: Int, y: Int) {
  // All sorts of numerical methods
}

Let's say that my library turns out to be a bit slow and after profiling it I discover that one point in particular is used a lot, (0, 0), and a lot of the algorithms can be simplified considerable in that case. So I make a separate Origin class which represents (0, 0) and has its own simplified implementations of all operations (here in an imaginary case-extension syntax):

class Origin extends Object case Point(0, 0) {
  // Methods specialized for (0, 0)
}

In Scala you can't do that -- Origin must be a subclass of Point if it wants to match as a point. In this case extending Point means that Origin gets two fields and a lot of methods, none of which it is interested in using. Bugger.

The second problem is that the values in the pattern can not be computed, they are always read directly from the object. You might want to optimize the numerical library by making some operations lazy, so you have a LazyPoint which is just like a point except that it doesn't calculate its coordinates until they are requested. Unfortunately you can't: the coordinates are read through an accessor, which is generated for you, that simply returns the value of a variable. Double bugger.

The last problem is that once a class has declared that it's a case, subclasses can't change or override that. An example from the post about equality was the class LinePosition, which represents a position corresponding to a particular line in a text file. A subclass of this class was CharacterPosition, which not only specifies the line in the file but also a particular character within that line. I would probably want LinePosition to match LinePosition(line: Int) and CharacterPosition to match CharacterPosition(line: Int, char: Int). Unfortunately I can't because if one is a subclass of the other then they must match the same pattern. In this case, CharacterPosition must match as LinePosition(line: Int). Triple bugger.

The conclusion here is that if I use case classes and pattern matching I've put a lot of restrictions on the way I can implement my classes. If I was to implement the linear algebra library I mentioned above and use pattern matching to distinguish between points, lists and matrices, I've severely limited the freedom to decide how to actually implement these objects. And the limitations are especially uncomfortable because they dictate how you should implement your objects: that superclass must be a particular class and that the internal state of the object, or at least part of it must be readable directly from variables.

What I'm saying is not that I think pattern matching in Scala was poorly designed -- it's more that I'm probably trying to use it to solve a problem it probably wasn't designed to solve. But I don't think it's a fundamental problem with pattern matching. So without further stalling, here's my suggestion for an alternative semantics that is much less restrictive.

There is already a well-known way to dispatch on the type of an object: the visitor design pattern. When you pattern match an object, you shouldn't inspect the object "from the outside" to select the right case, you should instead let the object itself select the right case using double dispatch. If you were to write a double-dispatch version of the lambda example from before, without any syntactic sugar, the code would look something like this:

class Expr
class Var (x: String) extends Expr {
  def match(m: Matcher) = m.var(x)
}
class Apply (f: Expr, e: Expr) extends Expr {
  def match(m: Matcher) = m.apply(f, e)
}
class Lambda(x: String, e: Expr) extends Expr {
  def match(m: Matcher) = m.lambda(x, e)
}

and a match expression would expand into something like:

val e: Expr = ...
e.match(new {
  def var(x) = ...
  def apply(f, e) = ...
  def lambda(x, e) = ...
})

The match expression is packed into a Matcher object which is really a mini-visitor. The matcher is then given to the to the object being matched which calls back to select the case to match. Note that I'm not suggesting that you write the code like that. In simple cases the same syntax could be used as in Scala, in more complex cases you might need a more general syntax but I haven't quite thought that through.

Using visitor-pattern-matching an object has full control over what and how it "matches". In the Origin example this means that an Origin object is free to implement its match method so that it matches as a point. It also means that the lazy point object is free to calculate the coordinates before calling back to the Matcher. Finally, CharacterPosition can override the match method so it is no longer forced to match as a LinePosition. Nice!

Of course there are problems. If you want to be able to have default cases that match if no other cases do, the matcher object must be able to handle calls to methods it doesn't implement -- sort of like the way smalltalk does with doesNotUnderstand. Another problem is that the extra flexibility enables objects to abuse the Matcher object for instance by matching more than once or not at all. Finally, you don't want to use regular method names like point since that's a flat namespace -- instead you would use some form of qualified names. But none of these problems are very difficult to solve.

Bottom line: I think visitor-ifying the pattern matching mechanism in Scala would make it much more flexible and useful but still provide, as a special case, the current mechanism. And oh, I almost forgot, it can also be used to solve the equality problem that I described in an earlier post but that will have to wait.

Update: I discussed this with some people from the scala group and they managed to convince me that visitors are just not a realistic implementation strategy. Bugger!