Sunday, March 25, 2007

Hobbling scheme for readability?

There's a discussion going on over at Lambda the Ultimate about the use of Scheme as a shell scripting language (scsh). As is typical in such discussions, the people accustomed to parentheses are saying that all the parentheses aren't so bad, while the people unaccustomed to them feel that traditional shell scripting is clearer.

I gotta come down firmly on the no-parens side of this debate. Lispy parentheses are an egregious example of center embedding, which are an impossible strain on short-term memory. It's quite hard to write lisp code without help from an editor program that counts parentheses for you. Lisp is optimized for machine parsing, but pessimized for human parsing.

It would be interesting to see just what limits there are on programs you can write without center embedding. Suppose you had a dialect of lisp in which the outermost expression did not need parentheses around it, a comma would close a single paren, a period would close multiple parens, and whenever there are two levels of paren, the ONLY way to close them would be with a period, which would end the outermost expression. So something like this would be legal:


a (b c, d (e f, (g (h (i j.

and would look like this in traditional syntax:

(a (b c) d (e f) (g (h (i j))))

The point being that you could nest as deeply as you liked, but only at the end of the expression. I don't know, it looks kind of ugly at first glance, but it parallels nicely English structures like:

This is the cat that ate the rat that lived in the house that Jack built.

If you forced yourself to write in that style, and just defined variables to refer to when you couldn't find a way to structure things in this way, I wonder what the program would look like.

Saturday, March 17, 2007

select * from universe where planet=earth

In Ten Ways to Sunday, Nathan Allen discusses the fact that the powerful relational model you have for referring to data in a database, does not extend to entities outside the database, and he proposes that similar relational model should be available to refer to everything a computer has access to, with some way of explicitly narrowing the scope of this system for the context of a particular task.

Referring to everything in the world with a consistent relational model is a conceptual nightmare, as Nathan hints at, and as you'll soon discover too if you peruse an upper-level ontology like SUMO or cyc, and contemplate relating some dataset that you work with into one of these ontologies. But I think something along these lines is going to end up bearing fruit in the long run. Human communication always uses words with known meanings, linked into a shared cultural knowledge set. Because of that, we can describe data, processes, and new ideas to each other without having to define terms nearly as often as we must when writing software or laying out databases. Our data is implicitly embodied as part of society's larger body of knowledge. Today's databases, on the other hand, are a bunch of arbitrary items whose interpretation is described by machine-unreadable documentation, and implicitly described by the the way software happens to be written to use it.

While humans seem to share an implicit upper-level ontology, we don't actually know what it is, as evidenced by the difficulties and controversies surrounding the building of formal upper level ontologies. If we can figure out why we can communicate so well with each other despite that, I think we'll have learned something important. I suspect what's going on is that we have multiple ontologies for different kinds of ideas and objects, that are loosely and inconsistently interrelated.

The inconsistency of our ontological landscape is evidenced by those weird questions we stay up late at night arguing about like "why is there a universe" or "does red look the same to you as it does to me" or "is there a God". It's also where "religious" debates come from, by which I mean those debates where your position seems so obvious that you think the other guy must be being deliberately obtuse not to see what you see so clearly.

Someday AIs in charge of personal data management will fritter away their spare CPU cycles arguing about whether postal codes should be intrinsic parts of addresses or derived data items by way of post office servers. During the day, though, they'll warily agree to disagree and find a practical workaround.