Friday, February 16, 2007

Evolution of AI

Grey Thumb brings our attention to this article: The Evolution of AI, published in Artificial Intelligence last year.

The author accuses AI researchers of being crypto-creationists, for not putting more emphasis on evolutionary techniques in their research. While I think the author is essentially right in saying that evolutionary techniques will play an important role in the research and development of artificial intelligence, it seems to me to be merely practical tool in that field, not a relevant object of study in that context. What's interesting about AI is not the mere production of software that has intelligence-like features, but the deeper understanding this process can lead to: just what intelligence and consciousness really are, and how the human mind might work.

The problem with evolutionary techniques is they often lead to designs we can't understand, or that, on analysis, turn out to work for strange, quirky reasons.

I call that a "problem" only in that it doesn't naturally lead to a better understanding of a problem domain. If you're trying to design a tone discriminator, as in the article above, or a better antenna, it's fantastic what genetic algorithms can do for you. But in trying to understand intelligence, the project as a whole would be too large for a genetic algorithm to tackle and would lead to an unanalyzable mess. I'm not making that statement without evidence -- look at what nature has given us: our brains took billions of years to evolve, and after thousands of years of trial, error, and science we still can't reliably tweak it to cure common problems like depression and schizophrenia.

I'm sure as evolutionary techniques improve, people will use them to design components of intelligent systems, and analyze those designs to come up with theories about how they work. In fact we'll probably do a lot more of that in a lot of fields. But the only way I see evolutionary theory as being directly a part of the AI field is if it turns out that there are evolutionary processes going on within the human mind in real time -- a possibility the paper does not even discuss.

In short, he's conflating a research technique with a field of study.

Tuesday, January 23, 2007

Burrows-Wheeler Transform in Haskell

I got curious about program transformation techniques, and went looking to see what kind of work had been done on round-trip parsing: i.e. using the same grammar to turn a string into an abstract syntax tree, and then to turn it back into a string.

Well, I got distracted by this very odd thing I found called the Burrows-Wheeler transform. It's a weird way of sorting a string that is reversible. (And happens to be likely more easily compressed with something like run-length encoding, hence its use in compression programs such as bzip2). Basically you sort the string, then replace each character with the one that used to appear to the left of it back in the original ordering (or an "end of string" marker for the character that came from the front of the string).

Anyway, as an exercise I wrote an encoder and decoder in Haskell. Here it is, for your amusement. Feel free to use it if you want, but it's probably not very efficient. I need to read up on when Haskell does and does not save results of function calls -- are they always memoized?



module Bwt3 where
import Ix
import Data.List

--
-- For BWT we want to compare strings with the special
-- rule that the end of string sorts as greater than any
-- other character
--
compHighEol [] [] = EQ
compHighEol lst1 [] = LT
compHighEol [] lst2 = GT
compHighEol (x:xs) (y:ys)
| x > y = GT
| x < y = LT
| x == y = compHighEol xs ys

-- Datatype for rotated string; the string starts where the
-- integer member says it does, and wraps around when it hits the
-- end, conceptually
-- This lets us store multiple rotations of the same string
-- without using storage for each one; they all point to the same
-- structure
data Rotation = Rot Int [Char] deriving Eq
instance Show Rotation where
show (Rot k l) = (drop k l) ++ ['@'] ++ (take k l)
instance Ord Rotation where
compare (Rot a as) (Rot b bs) = compHighEol (drop a as) (drop b bs)


-- List of all possible rotations of a string
allrots str = [ Rot i str | i <- range(0, length str) ]

-- The actual output of this algorithm is a string with a flag
-- embedded in the middle of it, which can't be a character. So
-- we introduce a type for this purpose that
-- allows for an EOF symbol. Which by the way, sorts after
-- all other characters, to help with decoding

data FlaggedChar = FC Char | FC_EOF deriving Eq

instance Show FlaggedChar where
show FC_EOF = "@"
show (FC x) = show x

instance Ord FlaggedChar where
compare FC_EOF FC_EOF = EQ
compare _ FC_EOF = LT
compare FC_EOF _ = GT
compare (FC x) (FC y) = compare x y


-- BWT encoding: make all rotations, sort them, and take the last
-- character of each.
encode str = map lastchar (sort (allrots str))


-- lastchar just pulls the last character out of the
-- rotated string, or FC_EOF if that's the last item
lastchar (Rot 0 xs) = FC_EOF
lastchar (Rot ix xs) = FC (head (drop (ix-1) xs))

-- The rest of this stuff is for decoding the [FlaggedChar]
-- list we made above
--------------------------

-- Given a list (of anything), return a list of Ints
-- showing where the elements of the list would end up
-- if sorted.
--
-- We do that by tagging each element with an integer, sorting
-- those tagged items, and
-- collecting the tags afterwards to see where they ended up
sortPermutation xs =
map snd ( sortBy compfst (zip xs [0..]))
where compfst x y = compare (fst x) (fst y)

-- Turn that into a cycle by cycling through it once
getCycle xs =
take (length xs)
((iterate ((sortPermutation xs) !!) . fromInteger ) 0)

-- apply the permutation to the encoded item
-- and rotate the result so the end of string flag comes at the end
applyCycle cycle encoded =
fCtoString $
tail $
dropWhile (/= FC_EOF) answer ++ takeWhile (/= FC_EOF) answer
where answer = map (encoded !!) cycle -- Oops: fixed typo 9/2008

fCtoString :: [FlaggedChar] -> [Char]
fCtoString [] = ""
fCtoString ((FC c):cs) = [c] ++ (fCtoString cs)
fCtoString ((FC_EOF):ds) = fCtoString ds -- ignore FS_EOF

-- Finally the decode function puts it all together
decode xs = applyCycle (getCycle xs) xs

Sunday, January 07, 2007

Abstract Refactoring

This week I had a subroutine, let's call it Bedrock, whose functionality I needed to expand to handle a new situation. I won't bore you with the details, but the short story is that it created and handled exceptions from an object, Fred, and now I needed the same thing done with a very similar object, Wilma, which has almost, but not quite, the same behavior.

Of course, there's always this dilemma: is it clearer to just throw a little logic in the function to handle the two cases, something like:


sub bedrock(bool need_caveman) {
Caveperson c;
if (need_caveman) { c = new Fred(); }
else { c = new Wilma(); }

try {
c.bang_on_rocks();
} catch (Exception e) {
if (need_caveman) { printf("Fred can't find a rock\n"); }
else { c.call_betty(); }
}
}
...or is it better to separate the whole thing into separate functions:

sub bedrock(bool need_caveman) {
if (need_caveman) { bedrock_fred(); }
else { bedrock_wilma(); }
}

sub bedrock_fred() {
Caveperson c = new Fred();
try {
c.bang_on_rocks();
} catch (Exception e) {
printf("Fred can't find a rock");
}
}

sub bedrock_wilma() {
Caveperson c = new Wilma();
try {
c.bang_on_rocks();
} catch (Exception e) {
c.call_betty();
}
}
(the latter case further invites us to make bedrock() a virtual function in the superclass of Fred and Wilma so the dispatching function is handled more discreetly, but that's an optimization for another day).

In my case, sitting there in my IDE, I wasn't sure which would be clearer. I knew what needed to be done, but I couldn't quite visualize which way would be easier to read until I typed it out. (I have to admit I'm not much of a flowcharty, UML-y kind of thinker; I have to type out code, or at least pseudocode, to see what it looks like, then refactor from there and draw diagrams after the fact.)

The thing is, there's no functional difference between these two options, outside of some extremely trivial performance issues (the second one uses one more stack frame than the first -- whoop-de-doo). Why do I have to decide at all?

The first code example has the virtue of showing you clearly the relationship between Fred and Wilma -- where their needs differ, you have an explicit if statement. Its flaw is that if you want to know just about Wilma, you have to read past the clutter of irrelevant stuff about Fred. The second example obviously has the converse strength and weakness.

I'd like my IDE or revision control system to know about this trade-off, and be able to display the underlying functionality in either manner. Think of it as something like currying. Currying, if you're not familiar with it, is where you take a function with two arguments, fill in one of them, and treat that half-filled-in function call as a new function of just one parameter. For example, "+" takes two arguments, as in 3+4 = 7. If you curry it with a "3", you get a new function, "3+_", taking one argument and returning that number plus three.

Now in my bedrock example, looking at the first function where I have everything all lumped together; suppose you curried it with the value need_bedrock = true. In two places you'd end up with:
if (true) then { foo foo foo }
else { bar bar bar }
which a smart IDE ought to be able to display simply as:
foo foo foo
Much clearer, no?

In general, then, I'd like to have some kind of odd IDE modality that was like a debugger, in that I could play with variable values, but in which I could mutate the code to show what it would reduce to under current conditions. If I can rule out somehow that Wilma isn't relevant to the bug I'm tracking down, then the code logic can be automatically simplified for me.

What would be really cool would be if I could even edit the deWilmafied "curried" code, and it would be able to merge my changes back to the canonical codebase, just as changes to conflicting revisions in a source control system are merged. That could get messy though, so much careful treading would be in order.

Sunday, December 17, 2006

Questions on Haskell Style (and Polynomials redux)

Someone posted, on the Haskell "Blow your Mind" page, a much more concise implementation of Polynomials as Numbers than I managed to come up with. There are a couple things I learned by comparing it with my version:

  • Use of zipWith (+) instead of map (\v -> (fst v + snd v)) zip; obviously I need to get to know the libraries better
  • zipWith' is introduced to zipWith two lists together, padding the shortest one with zeroes. Jeremy Gibbons suggested the same thing, calling it longzipwith, and this is parallel to what I did without the "with" in my zipLong function.
  • They had the same questions I did about what to do with the required signum and abs functions. Mikael suggested in a comment that I should use the NumericPrelude library instead of the standard one, for a more sensible generic definition of a number.
  • Jeremy and Blow-your-mind both used one less line in their zip function definition than I did: they manage to define it without explicitly handling the case where both lists are null.
And here are some questions I have -- please comment if you have any ideas!
  • The Blow-your-mind version uses [a] instead of Poly [a] as its polynomial type. My feeling is that an explicit constructor and data type like Poly is better, because a polynomial is not the most obvious way to interpret a list of numbers. However, my functions ended up being a lot wordier, pulling that constructor on and off the list all the time. What's the better style choice here?
  • The multiplication function has me scratching my head. It seems terser, but less communicative to me. If you're got more experience with functional programming than I do: is the Blow-your-mind version much more clear or efficient, and is it worth it? Is mine only clearer to me because I'm not used to FP style yet? Or maybe tight APL-like gnomishness is part of the point of the Blow-your-mind page. (Here they are again if you don't want to follow all the links).
My version:
> scalarPolyMult :: (Floating a) => a -> Poly a -> Poly a
> scalarPolyMult c (Poly rs) = Poly $ map (*c) rs

> multByUnknown (Poly rs) = Poly (0:rs)

> multPoly :: (Floating a) => (Poly a) -> (Poly a) -> (Poly a)
> multPoly (Poly []) cs = Poly []
> multPoly (Poly (b:bs)) cs =
> addPoly
> (scalarPolyMult b cs)
> (multPoly (Poly bs) (multByUnknown cs))
Blow-your-mind version:
  xs * ys = foldl1 (+) (padZeros partialProducts)
where
partialProducts = map (\x -> [x*y | y <- ys]) xs
padZeros = map (\(z,zs) -> replicate z 0 ++ zs) . (zip [0..])

Wednesday, December 13, 2006

Automatic Spaghetti Refactoring

I wonder under what conditions it would be possible to take a working body of spaghetti code and automatically refactor it to pull out all references to one specific resource, or some other definable subset of the program's activity. Afterwards, your code would be as bad as before, or worse, in terms of organization, but it would at least be semantically equivalent.

If you could do this quickly, on demand, you could present the object to a programmer in some kind of constrained editor for modification, then re-inline the changes back into the original structure of the code. This would let a programmer pretend a design was nicely modular and object-oriented for the sake of conceptual simplicity in making a change, but without requiring the code to actually be organized at all.

Why bother doing this? It could be that the code is already organized according to some orthogonal concern that you don't want to mess up; or it's truly messy spaghetti code that you don't understand; or maybe more interestingly, it's represented internally as a dataflow graph or something, where there may be no linear or hierarchical structure to it by nature.

I know refactoring gets done piecemeal by menu options in some IDE's, and I know something similar is done within compilers as optimization steps, for example inlining, and loop unrolling, but I haven't heard of large scale, temporary changes just for the purposes of presentation.

The idea of making a transformation, applying a change, then reversing the transformation, puts me in mind of the notion of "conjugation" from group theory. However I haven't gotten to that chapter yet in my Algebra book, so I don't really know if there are any monkeys in that tree.

Thursday, December 07, 2006

Awkward db2 SQL syntax

I'm continually frustrated by the awkwardness of SQL syntax in DB2 (I do database work on an IBM iSeries). You can't do an update on a joined table; you can only update one table, so you have to do a subselect, typically expressing exactly the same subselect in two places in the same query. See Brett Kaiser's helpful example. I keep shying away from doing them this way because it seems so wrong, but I keep coming back to it as my best alternative. A language should not make you say the same complicated thing twice in once sentence. Anaphora is your friend, IBM!

Tuesday, December 05, 2006

Simonyi on Comments in Algol-60

Check out this fascinating slideshow by Charles Simonyi about how the language Algol-60 was originally used in practice -- as a language for comments. He observes that good code comments can be a good source of ideas for new programming language constructs, since comments are where programmers express ideas that they feel can't currently be made clear or concise in the code itself.

Monday, December 04, 2006

The Zen of Referential Transparency

Here's another interesting connection between Buddhist philosophy and math: this blog post by metaperl draws a connection between Haskell's lack of state variables, and Eckhard Tolle's belief that "the only thing that is real is the present moment."

I'm not familiar with Tolle, but this is a part of Zen thinking as well. You could choose to look at the past and future only as memories and projections that exist now, rather than as independent realities. I'm not claiming to know something that arcane about the universe. But it makes sense, using category theory as a metaphor, that you could choose to view the past and present and future as three real objects; or you could choose to look at the currently existing traces of the past, and currently existing precursors of the future as categorical arrows, and merely define the past and future in terms of those.

A pure function will return the same value every time it is called with the same arguments; this is convenient because it makes them timeless in a sense. If you define some function


f(x) = x+4,

you can say that f(2) = 6; and it's always true. If f(2) gets called 10 times during a program's execution, you only have to really do the addition once; after that you know it as a timeless fact. Even calculating it the once is bending to necessity; after all, it was already true before we calculated it.

But if we have some other function, g(), and we pass the result of f into it

g(x) = x * 7
print g(f(2))

Well, g(6) = 42 no matter what, and g(f(2)) = 42 no matter what. They're both universal timeless facts about this little system. But in practice the computer has to calculate f(2) in order to know what to feed to g, so that g(f()) relationship works like time. f is "before" g, even though they're both mathematically fixed, static, and unchanging.

My feeble understanding of Zen is that one can learn, among other things, to experience time as an artificial construct of this sort, and not as something fundamental. It's a pretty hard to nail down what's concretely different in this perspective, though, because it's probably isomorphic to the more conventional view of time as a mighty river, but supposedly there is a psychospiritual benefit to changing one's perspective in this way.

Some links:
An article about Zen, Time, and Bohm's interpretation of quantum physics
A prior post of mine on dependent origination and category theory