Sunday, January 07, 2007

Abstract Refactoring

This week I had a subroutine, let's call it Bedrock, whose functionality I needed to expand to handle a new situation. I won't bore you with the details, but the short story is that it created and handled exceptions from an object, Fred, and now I needed the same thing done with a very similar object, Wilma, which has almost, but not quite, the same behavior.

Of course, there's always this dilemma: is it clearer to just throw a little logic in the function to handle the two cases, something like:


sub bedrock(bool need_caveman) {
Caveperson c;
if (need_caveman) { c = new Fred(); }
else { c = new Wilma(); }

try {
c.bang_on_rocks();
} catch (Exception e) {
if (need_caveman) { printf("Fred can't find a rock\n"); }
else { c.call_betty(); }
}
}
...or is it better to separate the whole thing into separate functions:

sub bedrock(bool need_caveman) {
if (need_caveman) { bedrock_fred(); }
else { bedrock_wilma(); }
}

sub bedrock_fred() {
Caveperson c = new Fred();
try {
c.bang_on_rocks();
} catch (Exception e) {
printf("Fred can't find a rock");
}
}

sub bedrock_wilma() {
Caveperson c = new Wilma();
try {
c.bang_on_rocks();
} catch (Exception e) {
c.call_betty();
}
}
(the latter case further invites us to make bedrock() a virtual function in the superclass of Fred and Wilma so the dispatching function is handled more discreetly, but that's an optimization for another day).

In my case, sitting there in my IDE, I wasn't sure which would be clearer. I knew what needed to be done, but I couldn't quite visualize which way would be easier to read until I typed it out. (I have to admit I'm not much of a flowcharty, UML-y kind of thinker; I have to type out code, or at least pseudocode, to see what it looks like, then refactor from there and draw diagrams after the fact.)

The thing is, there's no functional difference between these two options, outside of some extremely trivial performance issues (the second one uses one more stack frame than the first -- whoop-de-doo). Why do I have to decide at all?

The first code example has the virtue of showing you clearly the relationship between Fred and Wilma -- where their needs differ, you have an explicit if statement. Its flaw is that if you want to know just about Wilma, you have to read past the clutter of irrelevant stuff about Fred. The second example obviously has the converse strength and weakness.

I'd like my IDE or revision control system to know about this trade-off, and be able to display the underlying functionality in either manner. Think of it as something like currying. Currying, if you're not familiar with it, is where you take a function with two arguments, fill in one of them, and treat that half-filled-in function call as a new function of just one parameter. For example, "+" takes two arguments, as in 3+4 = 7. If you curry it with a "3", you get a new function, "3+_", taking one argument and returning that number plus three.

Now in my bedrock example, looking at the first function where I have everything all lumped together; suppose you curried it with the value need_bedrock = true. In two places you'd end up with:
if (true) then { foo foo foo }
else { bar bar bar }
which a smart IDE ought to be able to display simply as:
foo foo foo
Much clearer, no?

In general, then, I'd like to have some kind of odd IDE modality that was like a debugger, in that I could play with variable values, but in which I could mutate the code to show what it would reduce to under current conditions. If I can rule out somehow that Wilma isn't relevant to the bug I'm tracking down, then the code logic can be automatically simplified for me.

What would be really cool would be if I could even edit the deWilmafied "curried" code, and it would be able to merge my changes back to the canonical codebase, just as changes to conflicting revisions in a source control system are merged. That could get messy though, so much careful treading would be in order.

No comments: