onChange

Friday, July 4, 2008

Code Style Choices

During a recent job interview, I was asked to code on a whiteboard a merge function for two lists of integers assumed to be sorted.

I said that it might be nice in a functional language using recursion, which seemed to surprise the interviewer. He offered that I could code it in any language I liked, but I decided to stick with Java, lacking the nerve to barge into Scala or Haskell on a whiteboard. And as I was thinking that doing the functional style in Java might require building some list-comprehension things, I decided to abandon the functional idea entirely.

I stumbled around a bit, coding and explaining, and ended up with a very messy whiteboard, but apparently did ok, as they've asked me back for more interviews.

I wasn't happy with the result though. Of course I didn't get to take the whiteboard home with me, but I decided to reproduce it approximately (I extracted a small helper that definitely hadn't been on the board), and it ended up like this:


    public static List<Integer> merge(List<Integer> list1,
                                      List<Integer> list2) {
        if (null == list1)
            return list2;
        if (null == list2)
            return list1;
      
        ArrayList<Integer> result = new ArrayList<Integer>();
      
        Iterator<Integer> iter1 = list1.iterator();
        Iterator<Integer> iter2 = list2.iterator();
      
        Integer next1 = getNext(iter1);
        Integer next2 = getNext(iter2);
      
        while ((null != next1) || (null != next2)) {
            if (null == next1) {
                result.add(next2);
                next2 = getNext(iter2);
            }
            else if (null == next2) {
                result.add(next1);
                next1 = getNext(iter1);
            }
            else if (next1 < next2) {
                result.add(next1);
                next1 = getNext(iter1);              
            }
            else {
                result.add(next2);
                next2 = getNext(iter2);              
            }
        }
        return result;
    }

    private static <T> T getNext(Iterator<T> iter) {
        if (iter.hasNext()) {
            return iter.next();
        }
      
        return null;
    }

This does work, but it's not pretty, basically because it's in a very procedural style, and because there's some repetition I don't see an easy way to eliminate. So I considered what could be done to improve it.

My first impulse was of course to return to the functional idea. So here's the Scala code I should have written on that whiteboard:


  def merge(list1: List[Int],
            list2: List[Int]): List[Int] = list1 match {
    case Nil => list2
    case head1::tail1 => {
      list2 match {
        case Nil => list1
        case head2::tail2 => {
          if (head1<head2)
            head1::merge(tail1, list2)
          else
            head2::merge(list1, tail2)
        }
      }
    }
  }

Much more concise, and reasonably clear modulo knowing Scala. And unless I went into complete panic because of the situation, I probably could have written it on the whiteboard and explained the Scala bits more quickly and with less stumbling than what I actually did. Of course this is using the Scala List class rather than the one from the Java collections library, so to actually use it from Java might involve some conversion.

As with much functional code, the main things needed for this are the head and tail functions (hidden by use of a pattern-matching idiom in theScala code above), which are not supplied by the Java collections library List. But away from the stress of interviews and whiteboards, it occurred to me that these are pretty trivial to make, and I quickly translated into Java:


    public static List<Integer> merge(List<Integer> list1,
                                      List<Integer> list2) {
        if ((null == list1) || (list1.isEmpty()))
            return list2;
        if ((null == list2) || (list2.isEmpty()))
            return list1;
       
        ArrayList<Integer> result = new ArrayList<Integer>();
       
        Integer head1 = head(list1);
        Integer head2 = head(list2);
       
        if (head1 < head2) {
            result.add(head1);
            result.addAll(merge(tail(list1), list2));
        }
        else {
            result.add(head2);
            result.addAll(merge(list1, tail(list2)));
        }
        return result;
    }

    private static Integer head(List<Integer> list) {
        return list.get(0);
    }

    private static List<Integer> tail(List<Integer> list) {
        return list.subList(1,list.size());
    }

Not as pretty as the Scala version because Java lacks the pattern-match stuff, but definitely an improvement over the procedural mess. I could see changing it so that the list was decorated to add the head and tail functions instead of just having them as static utility methods.

I'd probably do something like this in real code, as I think list manipulation makes sense in functional style. But it's not quite normal Java style.

The other version uses iterators to traverse both lists, and has to grab appropriately from both and advance them both correctly, which is a bit painful. But it struck me that a natural OO approach might be to make a composite iterator in which to hide all those decisions. And in order to select which one without advancing both iterators, maybe a decorated iterator having lookahead capability would be nice. This lead to two new Iterator classes (in which I didn't bother implementing the remove method):

LookaheadIterator:


import java.util.Iterator;

public class LookaheadIterator<T> implements Iterator<T> {

    private Iterator<T> delegate;
    private T lookahead;

    public LookaheadIterator(Iterator<T> iter) {
        this.delegate = iter;
        getNext();
    }

    @Override
    public boolean hasNext() {
        return (null != lookahead);
    }

    @Override
    public T next() {
        T current = lookahead;
        getNext();
        return current;
    }

    @Override
    public void remove() {
        throw new UnsupportedOperationException();
    }

    public T lookahead() {
        return lookahead;
    }

    private void getNext() {
        if (delegate.hasNext()) {
            lookahead = delegate.next();
        }
        else {
            lookahead = null;
        }
    }

}

MergedIterator:


import java.util.Comparator;
import java.util.Iterator;

public class MergedIterator<T> implements Iterator<T> {

    private Comparator<T> comparator;
    private LookaheadIterator<T> iterator1;
    private LookaheadIterator<T> iterator2;

    public MergedIterator(Comparator<T> comparator,
            Iterator<T> iter1, Iterator<T> iter2) {
        this.comparator = comparator;
        this.iterator1 = new LookaheadIterator<T>(iter1);
        this.iterator2 = new LookaheadIterator<T>(iter2);
    }

    @Override
    public boolean hasNext() {
        return (iterator1.hasNext() || iterator2.hasNext());
    }

    @Override
    public T next() {
        T next1 = iterator1.lookahead();
        T next2 = iterator2.lookahead();

        if (null == next1) {
            getNext(iterator2);
            return next2;
        }

        if ((null == next2) || (comparator.compare(next1, next2) < 0)) {
            getNext(iterator1);
            return next1;
        }
        else {
            getNext(iterator2);
            return next2;
        }
    }

    @Override
    public void remove() {
        throw new UnsupportedOperationException();
    }

    private void getNext(Iterator<T> iter) {
        if (iter.hasNext()) {
            iter.next();
        }
    }
}

The merge method itself (once you've fence-posted the null argument cases) is then of course a matter of creating the MergedIterator and iterating through it adding to a new list.

These extra classes make it a lot more code than the functional version for this toy example, and in fact maybe more code than the dumb procedural implementation, but it might be argued that this is a more object-oriented approach, and it might make sense if there were other needs for these iterators.

I still prefer the functional style for this example, and wish I'd done it in the interview.

Thursday, January 3, 2008

Testing with an in-memory Database

Persistence tests are slow, and hard to make repeatable. The repeatabity problem is primarily in dealing with setup to a known state and/or teardown to return to a known state.

Even if you do write setup and teardown to put data into the DB and remove it, things like automatically generated ids are somewhat painful to reset so that the next run generates the same ones again.

But HSQLDB allows creation of a database that lives only in memory, creating a fresh empty database on every run.

Last weekend, I decided to start playing with HSQLDB in unit testing, and it looks promising. I'll probably end up using it at work as well, though some of our tests may continue to hit a real DB matching what we use in production.

To create an in-memory HSQLDB for plain old JDBC access is pretty trivial. If I were doing this, I'd probably make a test helper class of some sort that had something like


    public Connection getConnection() throws Exception {
        Class.forName("org.hsqldb.jdbcDriver");
        return DriverManager.getConnection(
                              "jdbc:hsqldb:mem:testdb",
                              "sa",
                              "");
    }

to setup a database, and include methods to execute queries and updates. Data table definition could then be done with an update query, likely called from a JUnit test's setUp() method, and of course other queries would be used to retrieve data for assertions.

In the tearDown(), you might want a call to another helper method which shuts down the database. In the bit if experimentation I did, my helper class had the connection as a field conn set by the method above (which I actually made private) and another helper method


    public void shutdown() throws SQLException {

        Statement st = conn.createStatement();
        st.execute("SHUTDOWN");
        conn.close();
    }

called from the JUnit tearDown(). You don't need to delete any data though - it just disappears, as it was all in memory only. You might even be able to get away with skipping this altogether and just let the im-memory DB go out of scope and be garbage-collected.

But most of our code doen't use JDBC directly.

We use Hibernate for most persistence, and as it was new to me when I started at Cyrus last July, I've been playing with it off-hours to improve my familiarity with it. And this was what actually prompted me to start playing with HSQLDB.

As I was playing with both the XML mappings and Annotation mappings, I ended up with
a hierarchy of test helpers. Using only one or the other this might be simplified.

The root helper class was abstract:


public abstract class HibernateTestHelper {
    protected Configuration config;
    protected SessionFactory factory;

    protected void init(Class... classes) {
        setupHSQLDB();
        addClasses(classes);
        factory = config.buildSessionFactory();
    }

    protected abstract void addClasses(Class... classes);

    private void setupHSQLDB() {
        config.setProperty("hibernate.dialect", 
                           "org.hibernate.dialect.HSQLDialect").
               setProperty("hibernate.connection.driver_class", 
                           "org.hsqldb.jdbcDriver").
               setProperty("hibernate.connection.url", 
                           "jdbc:hsqldb:mem:testdb").
               setProperty("hibernate.connection.username", 
                           "sa").
               setProperty("hibernate.connection.password", 
                           "").
               setProperty("hibernate.connection.pool_size", 
                           "1").
               setProperty("hibernate.connection.autocommit", 
                           "true").
               setProperty("hibernate.cache.provider_class",
                           "org.hibernate.cache.HashtableCacheProvider").
               setProperty("hibernate.hbm2ddl.auto", 
                           "create-drop").
               setProperty("hibernate.show_sql", 
                           "true");
    }

/*
   helper methods for sessions, transactions, queries, etc. omitted...
*/

}

Subclasses set things up a little differently for Annotation or XML mappings:


public class HibernateAnnotationTestHelper extends HibernateTestHelper {

    public HibernateAnnotationTestHelper(Class... classes) {
        config = new AnnotationConfiguration();
        init(classes);
    }
    protected void addClasses(Class... classes) {
        for (Class clazz : classes) {
            ((AnnotationConfiguration)config).addAnnotatedClass(clazz);
        }
    }
}

public class HibernateXmlTestHelper extends HibernateTestHelper {

    public HibernateXmlTestHelper(Class... classes) {
        this.config = new Configuration();
        init(classes);
    }

    protected void addClasses(Class... classes) {
        for (Class clazz : classes) {
            config.addClass(clazz);
        }
    }
}

This allows for JUnit setUp() methods creating the appropriate helper for the mapping style in use mapping only the classes of interest to the test, and populating appropriate data.

Assorted helper methods are omitted from the abstract base helper class published here. They're pretty simple and obvious, and I suspect in actual use they might get a bit more elaborate, particularly in support for removing the test data setup logic from the JUnit setUp() methods. I might publish more detail on a revisit.

Something like this is likely to end up in our test code base at work. It seems faster than using a real database, and it's certainly nice to not have to tear down data.

Tuesday, December 25, 2007

Functional Adventures in Java

My work is primarily in Java, but I've played a bit with Haskell, and am jealous of some of the capabilities in functional languages. So I've been playing a bit with bringing a functional style to programming in Java. And this is play rather than work, so I can ignore the question of actually needing this stuff. Certainly some bits of similar stuff are already in place at work, but there it'll be driven by need rather than curiosity.

For a start, Haskell can handle infinite lists, and some interesting infinite lists occurring in math can be defined quite succinctly, as in:


sieve (x:xs) = x:sieve [y|y<-xs, (y `mod` x) /= 0]
primes = sieve [2..]

I'm not ready to create enough syntactic sugar to make Java code for an infinite list of prime numbers quite that concise, but support for infinite lists would be nice.

Of course, you can't ever get the whole list... Evaluations of the infinite stuff has to be "lazy".

After a bit of struggling to fit these ideas to Java's Collection libraries, I backed off and decided to work with a smaller interface, using a distinct name to limit confusion. The main interface is now:


public interface Sequence<T> {
  T head();
  Sequence<T> tail();
}

At least for the moment, I'm making static factory methods to create instances of this interface using anonymous inner classes rather than defining a bunch of subclasses. The first cut at a simple cons list constructor is pretty trivial:


   public static <T> Sequence<T> cons(
                                 final T head,
                                 final Sequence<T> tail) {
       return new Sequence<T>() {
           public T head() {
               return head;
           }

           public Sequence<T> tail() {
               return tail;
           }
       };
   }

as is the creation of an empty list:


   public static <T> Sequence<T> empty() {
       return new Sequence<T>() {
           public T head() {
               return null;
           }

           public Sequence<T> tail() {
               return this;
           }
       };
   }

And with just that much we can construct a simple list:


   Sequence<Integer> emptySequence = empty();
   Sequence<Integer> three = cons(1, cons(2, cons(3, emptySequence)));

The above is a snippet from a unit test...

It's not quite as nice as the approximately equivalent Haskell:


    three = 1:2:3:[]

Of course this process can only make finite sequences. A Haskell example for infinite sequences is:


    iterate (\x->x+1) 0

which yields a list of the natural numbers. In this, the second argument is the initial value and the first argument is a function for generating additional values.

While I'm not planning to implement the lambda expression stuff in Java right now, this example suggests we'll want an iterate function that takes an initial value and a recurrence function to generate the rest. Since I expect to need functions with different domain and range for the map and select I'm doing soon, I'll make an interface that allows that:


public interface Function<Domain, Range> {
   Range evaluate(Domain arg);
}

And the iterate function is:


   public static <T> Sequence<T> iterate(final T head,
                                         final Function<T,T> function) {
       return new Sequence<T>){
           public T head() {
               return head;
           }

           public Sequence<T> tail() {
               return iterate(function.evaluate(head), function);
           }
       };
   }

Again, the usage:


   Function increment = new Function(){
       public Integer evaluate(Integer arg) {
           return arg+1;
       }
   };
   Sequence<Integer> naturalNumbers = iterate(0, increment);

isn't quite as elegant as the Haskell version, but it gives an infinite list and doesn't blow the stack by trying to actually evaluate the whole list.

Implementing select and map as mentioned above, and then zip was mostly pretty similar and led to me noticing that there was a bit of duplication.

Each involved creating an anonymous inner instance of Sequence with simple and pretty similar implementations of the head and tail methods. It's essential that the defined tail methods not be evaluated prematurely, as that would lead to infinite recursion.

Perhaps functional style can come to the rescue.

The head and tail methods are functions without arguments, so if we define:


public interface NoArgFunction<Range> {
    Range evaluate();
}

and create a factory method that accepts function versions of head and tail:


    private static <T> Sequence<T>  construct(final NoArgFunction<T> head, 
                                         final NoArgFunction<Sequence<T>> tail) {
        return new Sequence<T>() {
            public T head() {
                return head.evaluate();
            }

            public Sequence<T> tail() {
                return tail.evaluate();
            }
        };
    }

then we can delegate the others to it.

As the head methods are frequently extremely simple, a named inner class helps shorten repetition a bit:


    private static class ConstantFunction<T> implements NoArgFunction<T> {
        private T value;

        public ConstantFunction(T value) {
            this.value = value;
        }

        public T evaluate() {
            return value;
        }
    }

And then the cons, iterate and map methods can become:


    public static <T> Sequence<T> cons(final T head, 
                                      final Sequence<T> tail) {
        return construct(new ConstantFunction<T>(head), 
                         new ConstantFunction<Sequence<T>>(tail));
    }

    public static <T> Sequence<T> iterate(final T head, 
                                     final Function<T,T> function) {
        return construct(new ConstantFunction<T>(head), 
                         new NoArgFunction<Sequence<T>>(){
            public Sequence<T> evaluate() {
                return iterate(function.evaluate(head), function);
            }
        });
    }

    public static <T1,T2> Sequence<T2> map(
                                final Function<T1, T2> function, 
                                final Sequence<T1> sequence) {
        return construct(new NoArgFunction<T2>(){
            public T2 evaluate() {
                return function.evaluate(sequence.head());
            }
        }, new NoArgFunction<Sequence<T2>>(){
            public Sequence<T2> evaluate() {
                return map(function, sequence.tail());
            }
        });
    }

I'm not actually too sure that last bit's an improvement. It's certainly a functional tactic, and does reduce the code, but I suspect there are tactics that can give us more effective lazy evaluation and maybe clearer code. The implementation as it stands doesn't retain computed values of entries in the infinite lists, so it's clearly not ready for prime-time.

But that's for another post. I'll set this aside for the moment, but hold onto the code and probably come back with an improvement later.

Or maybe I'll just take up Scala.

Sunday, November 18, 2007

Refactoring Challenge

In a recent post to the Test Driven Development yahoo group, Jeff Grigg posted a bit of a challenge, in the form of an uploaded file with some code that looked like:


public class LogicMatrix {
   public static void callTheRightMethod(boolean a,
                                         boolean b,
                                         boolean c,
                                         boolean d,
                                         Interface callback) {
       // TODO: Refactor this to be simple and obvious:
       if (!a && !b && !c && !d)
           callback.x();
       else if (a && !b && !c && !d)
           callback.z();
       else if (!a && b && !c && !d)
           callback.w();
       else if (a && b && !c && !d)
           callback.w();
       else if (!a && !b && c && !d)
           callback.x();
       else if (a && !b && c && !d)
           callback.x();
       else if (!a && b && c && !d)
           callback.z();
       else if (a && b && c && !d)
           callback.w();
       else if (!a && !b && !c && d)
           callback.y();
       else if (a && !b && !c && d)
           callback.x();
       else if (!a && b && !c && d)
           callback.x();
       else if (a && b && !c && d)
           callback.z();
       else if (!a && !b && c && d)
           callback.y();
       else if (a && !b && c && d)
           callback.y();
       else if (!a && b && c && d)
           callback.x();
       else if (a && b && c && d)
           callback.x();
   }
}

This is based on a "decision matrix" that Jeff included for a bit of interesting code generation and test generation in his JUnit test for the class.

This matrix was really a string:


   String matrix = "" //
               + "XZWW" //
               + "XXZW" //
               + "YXXZ" //
               + "YYXX";

His post asked us to make this code more resemble the decision matrix "without lots of reflection or other icky stuff". My tactic for doing this is to use the "decision matrix" itself as data to drive the code.

I found this example interesting because the problem is mainly in noticing a little math hiding in the repeated patterns. It's not too hard to see, but a refactor can certainly make it more visible in the code. It being Sunday, I have some free time, so I downloaded it and fired up Eclipse.

Jeff's tests all passed of course ...

First off, notice that if we interpret false as 0 and true as 1, the conditions of the above long method can be viewed as the binary numbers 0 through 15 (conveniently in the right order), and the method calls are running through the corresponding letters in the matrix. Not of course surprising, as it was code-generated from the matrix.

So let's make a little function that turns those booleans into indexes:


   private static int getIndex(boolean a,
                               boolean b,
                               boolean c,
                               boolean d) {
       int result = 0;
       result |= (a ? 1:0);
       result |= (b ? 2:0);
       result |= (c ? 4:0);
       result |= (d ? 8:0);
       return result;
   }

And change that monster method to:


   public static void callTheRightMethod(boolean a,
                                         boolean b,
                                         boolean c,
                                         boolean d,
                                         Interface callback) {
       int index = getIndex(a,b,c,d);
       if (index==0)
           callback.x();
       else if (index==1)
           callback.z();
       else if (index==2)
           callback.w();
       else if (index==3)
           callback.w();
       else if (index==4)
           callback.x();
       else if (index==5)
           callback.x();
       else if (index==6)
           callback.z();
       else if (index==7)
           callback.w();
       else if (index==8)
           callback.y();
       else if (index==9)
           callback.x();
       else if (index==10)
           callback.x();
       else if (index==11)
           callback.z();
       else if (index==12)
           callback.y();
       else if (index==13)
           callback.y();
       else if (index==14)
           callback.x();
       else if (index==15)
           callback.x();
   }

The tests still pass.

That method's still long, but at least it's now obvious there's a sequence of indices going on.

So we put that matrix into the code as a constant, use our indices, and the long method becomes:


   public static void callTheRightMethod(boolean a,
                                         boolean b,
                                         boolean c,
                                         boolean d,
                                         Interface callback) {
       int index = getIndex(a,b,c,d);
       char entry = matrix.charAt(index);
       if (entry == 'W')
           callback.w();
       else if (entry == 'X')
           callback.x();
       else if (entry == 'Y')
           callback.y();
       else if (entry == 'Z')
           callback.z();
   }

The code has suddenly shrunk. I could of course shrink it a bit more with reflection, but that's "icky", so I'll just leave it at that.

But I think there's a bit of duplication hiding in the getIndex function...

Lets try putting those parameters into an array and overloading (even though overloads are a little icky) :


   private static int getIndex(boolean a,
                               boolean b,
                               boolean c,
                               boolean d) {
       return getIndex(new boolean[] {a,b,c,d});
   }

   private static int getIndex(boolean[] args) {
       int result = 0;
       result |= (args[0] ? 1:0);
       result |= (args[1] ? 2:0);
       result |= (args[2] ? 4:0);
       result |= (args[3] ? 8:0);
       return result;
   }

And now the duplication is obvious - at least I think it's obvious ...


 private static int getIndex(boolean[] args) {
  int result = 0;
  for (int i=0; i<4; i++) 
   result |= bitForBooleanAtIndex(args, i);
  return result;
 }

 private static int bitForBooleanAtIndex(boolean[] args, int i) {
  return (args[i] ? 1<<i:0);
 }

That's about it. It's definitely clearer. Whether it meets the purpose of Jeff's challenge I'll leave to Jeff. I inlined some stuff to but I'll skip reposting everything.

There's some duplication between test and code now... At least they both have the "decision matrix".

For this to really be useful, it should probably be a class with instance methods and the matrix as a constructor parameter (which could eliminate that bit of duplication with the test code). It should then maybe allow for a differently sized decision matrix. I think it would need reflection to be generally useful, with the callback class and/or method list maybe also injected by constructor. But I'm not going there right now.

Anyway, now this very young blog has some actual code in it.

onStart

Maybe this post should be titled "Naming a Blog".

I started this thing yesterday and wasn't happy just naming the blog for myself, but couldn't really think of a better name.

So I went into a spin of up-front thinking.

What's this for anyway? I guess I'll be writing about Agile Software Development, my primary interest right now. I specifically have interest in the idea of refactoring across the borders of the Object-Oriented, Relational and Functional paradigms, and conceived the name "Paradigm Shift" to reflect that. But no, too specific - I might use that for a post title later... I might even ramble about the Structure of Scientific Revolutions and connect it to the rest of the blogging somehow.

I might also talk about math a bit, especially if I start in on functional programming... But the only math term I came up with that seemed somehow appropriate was "Convolution". That has some unfortunate overtones of complexity. Runs counter to Agile values. And again, the metaphor was too specific anyway, and really sort of the same as the previous try. Another reject.

The final rejected name "Refactoring Abstraction" might end up being a title for a post rambling on about what abstraction means in software and in math, and how I see my two careers as mathematician and programmer being related. But it sounds way too pretentious to apply to the blog.

Finally, it occurred to me that all of these thoughts are about changes. Kent Beck's dictum "Embrace Change" for me captures the essence of Agile. And that led me to the chosen blog name - "onChange". I couldn't put it in the url - someone already has it. So the url will keep my name.

And of course I had to continue the wordplay to the title of this "Hello World" post!

So there it is. I've started a blog and found a name for it. I promise not to title every post "onXxx".