Tuesday, December 25, 2007

Functional Adventures in Java

My work is primarily in Java, but I've played a bit with Haskell, and am jealous of some of the capabilities in functional languages. So I've been playing a bit with bringing a functional style to programming in Java. And this is play rather than work, so I can ignore the question of actually needing this stuff. Certainly some bits of similar stuff are already in place at work, but there it'll be driven by need rather than curiosity.

For a start, Haskell can handle infinite lists, and some interesting infinite lists occurring in math can be defined quite succinctly, as in:

sieve (x:xs) = x:sieve [y|y<-xs, (y `mod` x) /= 0]
primes = sieve [2..]

I'm not ready to create enough syntactic sugar to make Java code for an infinite list of prime numbers quite that concise, but support for infinite lists would be nice.

Of course, you can't ever get the whole list... Evaluations of the infinite stuff has to be "lazy".

After a bit of struggling to fit these ideas to Java's Collection libraries, I backed off and decided to work with a smaller interface, using a distinct name to limit confusion. The main interface is now:

public interface Sequence<T> {
T head();
Sequence<T> tail();
}

At least for the moment, I'm making static factory methods to create instances of this interface using anonymous inner classes rather than defining a bunch of subclasses. The first cut at a simple cons list constructor is pretty trivial:

public static <T> Sequence<T> cons(
final T head,
final Sequence<T> tail) {
return new Sequence<T>() {
public T head() {
return head;
}

public Sequence<T> tail() {
return tail;
}
};
}

as is the creation of an empty list:

public static <T> Sequence<T> empty() {
return new Sequence<T>() {
public T head() {
return null;
}

public Sequence<T> tail() {
return this;
}
};
}

And with just that much we can construct a simple list:

Sequence<Integer> emptySequence = empty();
Sequence<Integer> three = cons(1, cons(2, cons(3, emptySequence)));

The above is a snippet from a unit test...

It's not quite as nice as the approximately equivalent Haskell:

three = 1:2:3:[]

Of course this process can only make finite sequences. A Haskell example for infinite sequences is:

iterate (\x->x+1) 0

which yields a list of the natural numbers. In this, the second argument is the initial value and the first argument is a function for generating additional values.

While I'm not planning to implement the lambda expression stuff in Java right now, this example suggests we'll want an iterate function that takes an initial value and a recurrence function to generate the rest. Since I expect to need functions with different domain and range for the map and select I'm doing soon, I'll make an interface that allows that:

public interface Function<Domain, Range> {
Range evaluate(Domain arg);
}

And the iterate function is:

public static <T> Sequence<T> iterate(final T head,
final Function<T,T> function) {
return new Sequence<T>){
public T head() {
return head;
}

public Sequence<T> tail() {
return iterate(function.evaluate(head), function);
}
};
}

Again, the usage:

Function increment = new Function(){
public Integer evaluate(Integer arg) {
return arg+1;
}
};
Sequence<Integer> naturalNumbers = iterate(0, increment);

isn't quite as elegant as the Haskell version, but it gives an infinite list and doesn't blow the stack by trying to actually evaluate the whole list.

Implementing select and map as mentioned above, and then zip was mostly pretty similar and led to me noticing that there was a bit of duplication.

Each involved creating an anonymous inner instance of Sequence with simple and pretty similar implementations of the head and tail methods. It's essential that the defined tail methods not be evaluated prematurely, as that would lead to infinite recursion.

Perhaps functional style can come to the rescue.

The head and tail methods are functions without arguments, so if we define:

public interface NoArgFunction<Range> {
Range evaluate();
}

and create a factory method that accepts function versions of head and tail:

private static <T> Sequence<T> construct(final NoArgFunction<T> head,
final NoArgFunction<Sequence<T>> tail) {
return new Sequence<T>() {
public T head() {
return head.evaluate();
}

public Sequence<T> tail() {
return tail.evaluate();
}
};
}

then we can delegate the others to it.

As the head methods are frequently extremely simple, a named inner class helps shorten repetition a bit:

private static class ConstantFunction<T> implements NoArgFunction<T> {
private T value;

public ConstantFunction(T value) {
this.value = value;
}

public T evaluate() {
return value;
}
}

And then the cons, iterate and map methods can become:

public static <T> Sequence<T> cons(final T head,
final Sequence<T> tail) {
return construct(new ConstantFunction<T>(head),
new ConstantFunction<Sequence<T>>(tail));
}

public static <T> Sequence<T> iterate(final T head,
final Function<T,T> function) {
return construct(new ConstantFunction<T>(head),
new NoArgFunction<Sequence<T>>(){
public Sequence<T> evaluate() {
return iterate(function.evaluate(head), function);
}
});
}

public static <T1,T2> Sequence<T2> map(
final Function<T1, T2> function,
final Sequence<T1> sequence) {
return construct(new NoArgFunction<T2>(){
public T2 evaluate() {
return function.evaluate(sequence.head());
}
}, new NoArgFunction<Sequence<T2>>(){
public Sequence<T2> evaluate() {
return map(function, sequence.tail());
}
});
}


I'm not actually too sure that last bit's an improvement. It's certainly a functional tactic, and does reduce the code, but I suspect there are tactics that can give us more effective lazy evaluation and maybe clearer code. The implementation as it stands doesn't retain computed values of entries in the infinite lists, so it's clearly not ready for prime-time.

But that's for another post. I'll set this aside for the moment, but hold onto the code and probably come back with an improvement later.

Or maybe I'll just take up Scala.

No comments: