Saturday, October 4, 2008

Functional Purity in Scala

An important concept in functional programming is pure functions. The properties of pure functions means that they are easily verifiable and allows concurrent execution (an important advantage on today's multi-core machines). You can write pure functions in pretty much all programming languages, but AFAIK there are only two popular programming languages that verifies the purity at compile time: Haskell and D. In Haskell you can't use side effects or externally visible variables if you don't explicitly indicate so using monads in your function signature. D takes a different approach where the default is that you are allowed to use side effects and variables in your functions, and you must explicitly mark them as pure, see this presentation for more information. The difference in the implementations are surely because Haskell was designed from the start to be pure, but in D pure functions was added in version 2.0 of the language.

Pure Functions in Scala

I've been thinking about the best approach to implement pure function verification in the Scala compiler. An approach similar to the one in D would fit a lot better than the one used in Haskell (which would break all existing code and cause some problems due to strict evaluation). A solution using annotations would be quite simple to implement:

class Pure extends Annotation

In practice you want to define this as a runtime annotation in Java, but let's stick to Scala here. Now you can mark a function/method as pure:

@Pure def pureFunc(x : Int, y : Int) = x + y

There are some rules that the compiler must verify for a pure method/function:
  • Only calls to pure functions are allowed. This requires that a large number of functions in the Scala library must be marked using the pure annotation otherwise it will be impossible to write new pure functions.

  • Non-local vars can't be read or written. Local vars can be used in a pure function (as an accumulator for example), but you can't access static variables or variables reachable from the arguments.

  • A pure method can only be overridden by a pure method, and an interface method defined as pure can only be implemented as a pure method.

So far things are simple, but the restrictions imposed are quite severe, for example we can't create an array inside a pure function and use locally as this would result in calls to the non-pure array apply/update methods. Clearly this has to be allowed, which leads to the concept of "semi-pure" functions.

Semi-Pure Functions

Let's define the concept of a semi-pure function/method:

class SemiPure extends Annotation
class Pure extends SemiPure

As you can see a Pure function is a subtype of SemiPure, so anywhere a SemiPure function is required a Pure function can be used, for example a method marked as SemiPure can be overridden by a Pure method (but not the other way around). Here's an example of a semi-pure method:

case class Var(var value : Int) {
@SemiPure def inc = {
value += 1 // Ok, we can modify fields in this object
value
}
}

The following compilation rules applies to a semi-pure function:
  • It can call pure and semi-pure functions.

  • It can use local variables.

  • It can read and write variables reachable from its arguments. This includes the implicit this argument passed for class methods, so a class method can read/write fields of a class instance.

  • It can only be overridden by/implemented as a semi-pure or pure method.

So what use do we have for semi-pure functions? Well, they allow us to loosen the restrictions placed on pure functions somewhat: a pure function may call a semi-pure function if, and only if, all the argument values passed are created locally or are "invariant". "Invariant" is a term I borrowed from D, it's basically a deeply immutable value, i.e. it's an immutable value that only contain invariant values :). For example, List[Int] is invariant, but List[Array[Int]] is not invariant even though List[Array[Int]] is an immutable type.

So with this loosened restriction you can for example define a function that creates an array and updates it in a loop, and it can still be a pure function. A quite powerful concept that blends the imperative and purely functional programming styles.

Here's an example of a legal pure function that calls a semi-pure function (let's assume the method List[T].head is defined as pure):

case class Var(var value : Int) {
@SemiPure def inc = {
value += 1
value
}
}

@Pure def pureFunc(l : List[Var]) = {
val x = l.head // Ok, pure method called
val v2 = Var(10) // Object created locally
v2.inc // Ok, call of semi-pure function with locally created object
}

However, this is not allowed:

@Pure def pureFunc2(l : List[Var]) = {
val x = l.head // Ok, pure method called
x.inc // Error: semi-pure method called on variant external object
}

as it would result in an externally visible modification.

How to verify that a type is invariant is a problem that needs to be explored further. It should be doable with an addition of an immutable/invariant annotation, but there are some complications with subtyping and type parameters.

Function Objects

One more thing needs to be solved: how to declare a pure function parameter for a higher-order function. A quite simple solution is to create traits for pure and semi-pure functions and mix them with the function types. For example if you want to define a pure map function:

trait TSemiPure
trait TPure extends TSemiPure

@Pure def map[T, U](l : List[T], fn : (T => U) with TPure) : List[U] =
if (l.isEmpty) Nil else fn(l.head) :: map(l.tail, fn)

The TPure type would simply mean that the function objects apply method is considered pure by the compiler. There would be some restrictions on how the (semi-)pure traits would be allowed to be mixed in by the programmer. Another, maybe simpler, option is to create a new set of (Semi)PureFunction0-22 traits that extends the existing Function0-22 traits.

For lambda expressions the compiler could automatically infer if the function is pure, semi-pure or impure, and use the correct trait. During eta expansion the correct trait can be used depending on the annotation of the method/function expanded.

Final Words

Using the constructs I've presented in this post I think it would be feasible to implement checking of functional purity in the Scala compiler without too much effort. Hopefully this will result in a SIP in the near future, so that Scala hackers can utilize the powerful tool of statically checked pure functions.

I'm sure I've made some error or missed something along the way, so I'm grateful for any comments/corrections you might have.