Tuesday, April 21, 2009

Equality, Mutability and Products

The thoughts in this post was sparked by a discussion on the scala-user mailing list. The discussion was about the Object.equals() method and what its semantics should be. There are many different types of equality, but IMO the most usable one is if the expression a.equals(b) returns the same value regardless of when it's evaluated (provided the references a and b are constant), i.e. it's time invariant. If this was not the case, it would be quite impossible to implement Object.hashCode() in a useful way and you wouldn't be able to use objects as keys in a hash map or as members of a set.

The Java standard library breaks this practice in a number of places, for example java.util.ArrayList implements equals() as element equality which of course is not time invariant as the list can change over time (unless it's read only). I think the rationale behind this might be that there are no immutable collection classes in the Java libraries, and thus ArrayList is used for both mutable and immutable lists.

Equality for Mutable Case Classes

When using Scala case classes you automatically get an equals() and hashCode() implementation which calls the corresponding class fields methods. The implementation is the same regardless of whether the fields are mutable or not, so if you use vars the implementations will not be time invariant. For example:

scala> case class A(var i : Int)
defined class A

scala> val a1 = A(1)
a1: A = A(1)

scala> val a2 = A(1)
a2: A = A(1)

scala> a1 == a2
res57: Boolean = true

scala> a1.hashCode
res58: Int = -1085252538

scala> a1.i = 2

scala> a1 == a2
res59: Boolean = false

scala> a1.hashCode
res60: Int = -1085252537

How can we fix this? The Scala compiler only generates equals() and hashCode() implementations if we have not already implemented them in a super type (excluding java.lang.Object). So we can simply define a super trait for all mutable case classes:

trait Mutable {
override def equals(v : Any) = super.equals(v)
override def hashCode = super.hashCode
}

Now we can easily define mutable case class with the default, time invariant equals() and hashCode() methods:

scala> case class A(var i : Int) extends Mutable
defined class A

scala> val a1 = A(1)
a1: A = A(1)

scala> val a2 = A(1)
a2: A = A(1)

scala> a1 == a2
res61: Boolean = false

scala> a1.hashCode
res62: Int = 22213838

scala> a1.i = 2

scala> a1 == a2
res63: Boolean = false

scala> a1.hashCode
res64: Int = 22213838

A more elegant solution is to define a Var trait and use this instead of vars in case classes, but this is not always a practical solution and it has some performance drawbacks.

Time Variant Equalily for Products

So, what if I want to compare my case classes for element equality at a given moment in time? Turns out it's quite easy to implement such a function in Scala as all case classes implements the Product trait. Here's an example implementation (not thoroughly tested):

def equalElements(v1 : Any, v2 : Any) : Boolean = (v1, v2) match {
case (p1 : Product, p2 : Product) =>
p1.getClass == p2.getClass &&
(0 until p1.productArity).forall(i => equalElements(p1.productElement(i), p2.productElement(i)))
case _ => v1 == v2
}

This function has some limitations and won't work correctly with nested product and non-product objects. In this case a more type specialized solution is required.

Example usage:

scala> val a1 = A(1)
a1: A = A(1)

scala> val a2 = A(1)
a2: A = A(1)

scala> equalElements(a1, a2)
res68: Boolean = true

scala> a1.i = 2

scala> equalElements(a1, a2)
res69: Boolean = false

3 comments:

Unknown said...

Good post, however I was a bit confused upon initial reading and I thought you were asserting that time-invariance was a requirement of Object.equals(). I was even preparing to correct that assumption when I realized that was probably not your intent. For the sake of clarity, you might consider changing "There are many different types of equality" to something like "There are many different ways to implement equality". Also, a link to the archived scala-user discussion (e.g. Nabble) would have been nice, as I was unable to find it.

On a related side-note, I would like to point out a project I've been working on which aims to provide correct, low-maintenance implementations of hashCode() and equals() (as well as a resonable implementation of toString()) by using Java annotations to mark fields or methods for use in any valid combination of those operations. It prevents you from doing things like including a value in hashCode() which is not a part of equals(), for example. The project is called Pojomatic and can be found on SourceForge if you are interested. It was written with Java conventions in mind (e.g. getters and setters) but I can't think of a reason why it wouldn't work well in Scala.

Morgan Creighton said...

Thanks for the interesting post.

So, in Java, you get the base equals and hashCode behavior, unless you override them. But in Scala case classes, you get a field by field equals and hashCode behavior, unless you override them (or extend your Mutable trait).

Somehow this difference reminds me of how Java and C++ treat virtual functions differently. In C++, functions have to be marked virtual by the special keyword. But in Java, functions are virtual unless they are marked by a special keyword (final).

It's interesting to me how changing "defaults" from one language to another can reflect profound philosophical differences.

Jesper Nordenberg said...

Sorry for the late replies, I need to check for comments more often :)

@chris: I'm not assuming all implementations of equals() are time invariant, in fact I gave an example in the post on where it wasn't. My point is that implementations should be time invariant though as many usages of equals() (even in the JRE) assumes it is. Your project looks interesting and similar to what is generated by the Scala compiler for case classes, except there you can't mark which fields should be included/excluded in the generated code. Btw, can't seem to find the scala-user discussion either...

@morgan: Yes, defaults matters. :)