Equals

I recently discovered that if you look closely at the seemingly simple problem of comparing objects for equality it’s actually very hard, if not impossible, to solve satisfactorily. I looked around a bit to see if this was a well-known issue and it turns out that I’m certainly not the first person to notice this problem. See for instance here and here. This post introduces the problem (or one of the problems) of implementing object equality methods.

I have a standard template for implementing the equals method in Java:

public class X ... {

// ... body of X ...

public boolean equals(final Object obj) {
if (this == obj) return true;
if (!(obj instanceof X)) return false;
final X that = (X) obj;
// X-specific comparison
}

}

Unfortunately it turns out that implementing equals using instanceof doesn’t work in general. Let’s say you have a class LinePosition which specifies a particular line in a text file. Two line positions are equal if they represent the same line:

public boolean equals(final Object obj) {
if (this == obj) return true;
if (!(obj instanceof LinePosition)) return false;
final LinePosition that = (LinePosition) obj;
return this.getLine() == that.getLine();
}

Now you might later want to create a subclass, CharacterPosition, which not only specifies a line but also a particular character in the line. Two character positions are equal if they specify the same line and the same character in that line:

public boolean equals(final Object obj) {
if (this == obj) return true;
if (!(obj instanceof CharacterPosition)) return false;
final CharacterPosition that = (CharacterPosition) obj;
return this.getLine() == that.getLine()
&& this.getCharacter() == that.getCharacter();
}

This might look reasonable but it doesn’t work: equals is required to be symmetric but in this case it isn’t:

final LinePosition a = new LinePosition(3)
final CharacterPosition b = new CharacterPosition(3, 5);
final boolean aIsB = a.equals(b); // true
final boolean bIsA = b.equals(a); // false

The problem is that b is more picky about who it considers itself to be equal to than a. Damn.

An alternative approach is to avoid instanceof and use getClass instead:

public boolean equals(final Object obj) {
if (this == obj) return true;
if (obj == null || obj.getClass() != this.getClass()) return false;
final X that = (X) obj;
// X-specific comparison
}

Using this technique to implement equals on the text position classes would cause a and b to be equally picky about who they considered themselves to be equal with so equals would be symmetric. Unfortunately, now the objects are so picky that they break Liskov Substitutability, which is a Bad Thing.

Here’s an example where that might hit you. Let’s say that during debugging you notice that instances of LinePosition turn up in places where you didn’t expect then. If I wanted to find out where these positions came from, I would probably make an empty subclass of LinePosition for each place where I instantiate them so each instantiation site uses its own class. That way, whenever I find a line position where I didn’t expect it, I can see where it was instantiated

But if equals has been implemented using class equality I can’t do that because even though I don’t change any methods, the behavior of my program has changed in a subtle way. One position can now only be equal to another position if they were created at the same place, which is very different from how the program behaved before. I’ve tried hunting bugs that occurred because a subclass behaved differently from superclasses merely because it had a different class. Bugs like that can be extremely evil because it’s so unexpected if your mind is warped the way object orientation usually warps people’s minds.

At the core, this problem is a question about when can you consider two objects to be of the same kind of objects. Having a common superclass does not imply that two objects are the same kind, as the LinePosition and CharacterPosition example demonstrates. On the other hand, sometimes objects should be considered to be of the same kind even if they have different classes. So how do you determine whether or not two objects should be considered to be the same kind of object? In Java, I don’t know. I think the getClass approach is the greater of two evils so I’ll probably keep doing things the way I’ve always done, using instanceof. Looking outside Java and java-like languages I have an idea for a solution using a mechanism that’s a combination of Scala-style pattern matching and the visitor design pattern. But that will be the subject of another post in the near future.

23 Responses to Equals

  1. instanceof does not break reflexivity but symmetry

  2. Whoops-a-doodle. That has now been fixed.

  3. In multiple-dispatch languages like Nice is this a non-problem?

  4. At first I thought that multiple dispatch solves this problem the same way it solves the binary method problem in your example. But on second thought, while they it does solve the problem in some sense, the “solution” has some uncomfortable side effects. I’m no expert on the exact semantics of multiple dispatch, though, so I may be wrong here.

    Imagine that I was to add another point implementation to your example, say a polar point. It wouldn’t be completely unnatural to want to be able to compare polar and rectangular points and have them turn out equal (ignoring any precision issues that could occur when converting between representations). You are free to implement that by adding three new equality methods: equals(#Polar, #Polar), equals(#Polar, #Point) and equals(#Point, #Polar). If you were to add a third representation you could also do that but now you would have to add, well, a lot more new methods, I’m not sure exactly how many. So while it’s technically possible you need global knowledge of which other objects consider themselves to be equal to you, and you end up with an combinatoric number of methods.

    As I say in the last paragraph of the post, I don’t think you can determine whether two objects can be equal to each other by inspecting their inheritance hierarchy. But like single dispatch, the inheritance hierarchy is all multiple dispatch has to work with so I don’t see how it can solve the problem completely.

  5. sometimes objects should be considered to be of the same kind even if they have different classes
    Let’s say in general that objects of different type are not comparable.

    [U,V] boolean equals(U u, V v) = false;

    Then let’s override that for some specific types that we wish to be are comparable

    override boolean equals(Point p1, PolarPoint p2) = …;
    override boolean equals(PolarPoint p1, Point p2) = p2.equals(p1);

    you need global knowledge of which other objects consider themselves to be equal to you, and you end up with an combinatoric number of methods
    We’re talking about the corner cases – how do we deal with subtypes that are not comparable to their supertype (Point equals ColorPoint)
    – how do we deal with unrelated types that we wish to be comparable for equality (Point equals PolarPoint)

    In most cases we’ll just implement the single method
    [T] nice.lang.boolean equals(!T,!T)

  6. I agree that multiple dispatch gives you a convenient way to express equality in many situations where single dispatch doesn’t work, including the examples in the post. But what I’m looking for is a solution that actually does cover all the corner cases.

    I consider the inheritance relationship between classes to be more or less incidental, an implementation detail, and ideally, equality should be completely orthogonal to inheritance.

  7. But what I’m looking for is a solution that actually does cover all the corner cases.
    And why don’t you think that multiple dispatch covers all the corner cases?

  8. And why don’t you think that multiple dispatch covers all the corner cases?

    As I understand your previous comment, you agree that in the worst case you need global knowledge and an exponential number of methods to implement equality with multi methods. This is why I don’t consider all cases to be covered by multiple dispatch: requiring global knowledge has so many evil side-effects that it’s not a practically useful solution.

    I don’t have a very deep understanding of multi-methods, though, so I may have misunderstood you or missed a more obvious solution.

  9. Let’s see if we can reach a common understanding 😉

    1) I’m not sure what you mean by requiring global knowledge. Let’s say we already have a library Point and now we want to write another library PolarPoint which can be used with Point.

    As part of our domain knowledge we know about Point and PolarPoint, we know that they should be equals comparable – is that what you mean by global knowledge?

    Do you have a specific code example?

    2) I think there may be some misunderstanding about an exponential number of methods.

    When we write the library PolarPoint, we don’t write equals methods for int and PolarPoint, for String and PolarPoint, for Rectangle and PolarPoint… etc

    We only write methods for the ordinary case PolarPoint.equals(PolarPoint) and the special case Point.equals(PolarPoint) / PolarPoint.equals(Point)

  10. Here’s an example of what I mean by “global knowledge”.

    Let’s say that I’m working on one part of a very large application where the Point class is defined in an external library. I decide to add PolarPoint and the methods for comparing it to points.
    In some other part of the application, someone else has implemented complex numbers and added the methods for comparing those to points. It now holds that a polar point can be equal to a regular point, a regular point can be equal to a complex number, but polar points and complex numbers cannot be equal unless additional methods are added for comparing them. Either the implementor of PolarPoint must be aware of the Complex class or vice versa otherwise who’s going to add the methods for comparing polar points and complex numbers?

    As for the “exponential” thing, I think I may have overstated that one a bit ;-). What I was thinking of there was that as you add new classes that you want to be comparable to each other, you have to add an increasing number of equals methods:

    Point added
    equals(Point, Point)

    PolarPoint added
    equals(PolarPoint, PolarPoint)
    equals(PolarPoint, Point)
    equals(Point, PolarPoint)

    Complex added
    equals(Complex, Complex)
    equals(Point, Complex)
    equals(Complex, Point)
    equals(Complex, PolarPoint)
    equals(PolarPoint, Complex)

    While “exponential” is unfair, quadratic is still pretty bad.

  11. but polar points and complex numbers cannot be equal unless additional methods are added…
    Yes, this is programming not math 🙂

    Either the implementor of PolarPoint must be aware of the Complex class or vice versa…
    When, and if, someone writes PolarPoint.equals(Complex) the compiler will complain
    No possible call for equals.
    Arguments: (PolarPoint, Complex)

  12. Yes, this is programming not math 🙂

    If you’re wondering what that sound is, it’s Dijkstra rolling over in his grave ;-).

    It’s not as much a question of math as of obeying the contract of the equals method. If equality is not transitive you get into all sorts of trouble. Consider this example:

    Set s = new HashSet();
    s.add(new PolarPoint(0, 0));
    s.add(new Complex(0, 0));
    // s now contains two elements
    s.add(new Point(0, 0))

    Does this cause the two existing elements to be removed from the set? Or only one?

    When, and if, someone writes PolarPoint.equals(Complex) the compiler will complain

    Isn’t that case covered by the catch-all case?

    [U,V] boolean equals(U u, V v) = false;

  13. Dijkstra rolling over LOL

    Isn’t that case covered by the catch-all case?
    Yes, we don’t need the catch-all in Nice mea culpa.

    So how do we usually express common behaviour from among unrelated types?

    interface PointEqual {
    boolean equals(Point p);
    Point coerce();
    }

    class Point {
    int x = 0; int y = 0;
    equals(Point p) = (this.x == p.x) && (this.y == p.y);
    boolean equals(PointEqual p) = this.equals(p.coerce);
    }

    class A implements PointEqual {
    equals(PointEqual p) = this.coerce.equals(p.coerce);
    equals(Point p) = this.coerce.equals(p);
    coerce() = new Point();
    }

    class B implements PointEqual {
    equals(PointEqual p) = this.coerce.equals(p.coerce);
    equals(Point p) = this.coerce.equals(p);
    coerce() = new Point();
    }

    Those duplicate equals,equals implementations make Traits seem a nice idea, but at least we’ve avoided the quadratic.

  14. Yes, I think that solves it. The drawback is that the person who implements Point has to foresee that someone else might implement an object which is comparable with Point. Or use this techique always, just in case. But maybe there’s some way of making that less of a burden.

    The reason I wrote the post in the first place was that I thought I had come up with something that solved the problem completely. After this discussion, I can see that that the solution I had in mind doesn’t work. Bugger!

  15. thanks, I’m having fun remembering what can be done with Nice 😉

    has to foresee
    Well maybe not, let’s assume Point and the other classes have already been defined by other programmers.

    import oldlib; // Point
    import alib; // A
    import blib; // B

    abstract interface PointEqual {
    boolean equals(Point p) = this.coerce.equals(p);
    [PointEqual T] boolean equals(T pe) = this.coerce.equals(pe.coerce);
    Point coerce();
    }

    [PointEqual T] boolean equals(Point p, T pe) = p.equals(pe.coerce);

    class alib.A implements PointEqual;
    coerce(A a) = new Point();

    class blib.B implements PointEqual;
    coerce(B b) = new Point();

    We’ve also managed to push those duplicate method implementations into the interface.

  16. I see, that’s very clever. I seem to be running out of arguments here. I still don’t quite buy it, though.

    The reason the previous solution worked was that all the classes involved agreed on how to implement equality. The reason they could agree was that the PointEqual interface was part of the implementation of Point. If class A and class B are implemented independently of each other and PointMatch is not distributed with Point, they each end up implementing their own abstract PointMatch interface. This brings you back to the original problem: they’re both comparable with point but not each other.

    With power tools like abstract interfaces available, though, I’m starting to think that maybe there is a solution to be found.

  17. Let’s make sure we have a common understanding.

    the PointEqual interface was part of the implementation of Point
    No. Point in oldlib doesn’t know anything about the PointEqual interface, this is oldlib

    class Point {
    int x = 0; int y = 0;
    equals(Point p) = (this.x == p.x) && (this.y == p.y);
    }

    If class A and class B are implemented independently of each other
    They are – they are in separate packages without any reference between them, this is alib (and blib is similar):

    class A {}

  18. No. Point in oldlib doesn’t know anything about the PointEqual interface, this is oldlib

    I was referring to the previous solution, the one where Point did know about PointEqual, to point out that the thing that made that work was missing in the last solution.

    They are – they are in separate packages without any reference between them

    Right, I didn’t put that very well. I was referring to the fact that equality for A and B is defined in the same place.

    I think it’s an unlikely scenario that Point, A and B have been defined separately and then later someone comes along and makes them comparable to each other. I would expect equality with Point to be part of the implementation of A and B. In that case, the last solution breaks down because they’ll both get their own implementation of PointEqual. In the previous solution, where PointEquals was part of Point, that’s not a problem.

  19. an unlikely scenario
    I understand what you mean.

    1) We might say that making the effort to define the PointEqual interface is unlikely, and we should expect more piecemeal development. Assume the basic oldlib Point definition, then someone might come along and define

    package alib; import oldlib;
    class A { boolean equals(Point p) = true; }
    boolean equals(Point p, A a) = true;

    and later someone else might come along and define

    package blib;
    import oldlib; import alib;
    class B {
    boolean equals(Point p) = true;
    boolean equals(A a) = true;
    }
    boolean equals(Point p, B b) = true;
    boolean equals(A a, B b) = true;

    2) We might say that when someone comes along to define blib they realize that it might be better to define interface PointEqual, and they see the need to “fix-up” class A as-well-as Point, or the compiler reminds them 😉

    package pelib;
    import oldlib; import alib;

    interface PointEqual {
    boolean equals(Point p) = this.coerce.equals(p);
    Point coerce();
    }
    boolean equals(Point p, PointEqual pe) = p.equals(pe.coerce);

    Point coerce(A a) = new Point();
    boolean equals(A a, PointEqual pe) = a.coerce.equals(pe.coerce);
    boolean equals(PointEqual pe, A a) = pe.coerce.equals(a.coerce);

    and then define blib using PointEqual

    package blib;
    import oldlib; import pelib;

    class B implements PointEqual {
    coerce() = new Point();
    }

    3) I would expect equality with Point to be part of the implementation of A and B. In that case, the last solution breaks down because they’ll both get their own implementation of PointEqual.
    afaict, in that case A and B wouldn’t implement PointEqual, they’d have specific methods A.equals(B) A.equals(PE) B.equals(A) B.equals(PE) A.coerce B.coerce and classes we defined from then on would implement PointEqual.

    4) In Nice, external methods and abstract interfaces mean that these options are not mutually exclusive.

  20. package blib;
    import oldlib; import alib;

    If you allow blib to import alib then you can no longer say that they’ve been implemented independently.

    and they see the need to “fix-up” class A as-well-as Point, or the compiler reminds them 😉

    But that brings us right back to the global knowledge scenario, right? Someone, possibly the implementor of B, has to know about all the classes in the system that are comparable to Point.

    By the way: thanks for the link to the paper. I had been looking for that.

  21. I have the feeling that you are moving the goalposts 🙂

    Maybe you could say exactly what scenario you have in mind, and what you see as the problems.

    right back to the global knowledge
    Everytime we compile “the system” the compiler will report any comparison in the system which does not have an equals implementation. The compiler provides the global knowledge we don’t have.

  22. I have the feeling that you are moving the goalposts 🙂

    I can see why you would think that but really, I’m not. I could probably have been a bit more explicit about the scenario I’m thinking of, though, so let me give you a bit more context.

    The application I have in mind is essentially eclipse, since that’s what I’m working on these days. I don’t know how familiar you are with the implementation of eclipse but it is implemented as a large number of independent components or plugins. The plugins don’t necessarily know anything about each other, but there are various mechanismes for plugging them together, often using runtime reflection. Plugins can be loaded and unloaded dynamically and are for performance reasons generally loaded reflectively, on demand, after eclipse has been started.

    If Point is defined in one plugin and I decide to implement class A in my own alib plugin, I have no way whatsoever of determining which other plugins might define classes that are also comparable with Point. It’s basically one big “soup” of code and you cannot assume any knowledge about which other plugins are present, besides the ones you require to run. All plugins are compiled separately so the compiler can’t really help here. This is why I’m allergic to any requirement of “global knowledge” by anyone, even the compiler: things should work even if I download alib and blib off the internet and load them into a running instance of eclipse using reflection.

    The only solution that has worked so far in this scenario is the one where PointEqual was a part of the implementation of Point. With that, you can make as many kinds of objects comparable with Point without any of them knowing about each other, and freely load and unload them. And even though it’s a bit heavy I have a feeling that this solution, or possibly a refinement of it, is as good a solution as you’ll get to this problem.

  23. Previously The drawback is that the person who implements Point has to foresee that someone else might implement an object which is comparable with Point…
    Given the context that’s a little like saying the “drawback” of extension points is that… 🙂

    afaict we are saying it’s a given that we want some other classes to be equals comparable with Point, and delivering a PointEqual interface with Point is a straightforward way of achieving that.

Leave a Reply

Your email address will not be published. Required fields are marked *


*