Saturday, March 17, 2007

select * from universe where planet=earth

In Ten Ways to Sunday, Nathan Allen discusses the fact that the powerful relational model you have for referring to data in a database, does not extend to entities outside the database, and he proposes that similar relational model should be available to refer to everything a computer has access to, with some way of explicitly narrowing the scope of this system for the context of a particular task.

Referring to everything in the world with a consistent relational model is a conceptual nightmare, as Nathan hints at, and as you'll soon discover too if you peruse an upper-level ontology like SUMO or cyc, and contemplate relating some dataset that you work with into one of these ontologies. But I think something along these lines is going to end up bearing fruit in the long run. Human communication always uses words with known meanings, linked into a shared cultural knowledge set. Because of that, we can describe data, processes, and new ideas to each other without having to define terms nearly as often as we must when writing software or laying out databases. Our data is implicitly embodied as part of society's larger body of knowledge. Today's databases, on the other hand, are a bunch of arbitrary items whose interpretation is described by machine-unreadable documentation, and implicitly described by the the way software happens to be written to use it.

While humans seem to share an implicit upper-level ontology, we don't actually know what it is, as evidenced by the difficulties and controversies surrounding the building of formal upper level ontologies. If we can figure out why we can communicate so well with each other despite that, I think we'll have learned something important. I suspect what's going on is that we have multiple ontologies for different kinds of ideas and objects, that are loosely and inconsistently interrelated.

The inconsistency of our ontological landscape is evidenced by those weird questions we stay up late at night arguing about like "why is there a universe" or "does red look the same to you as it does to me" or "is there a God". It's also where "religious" debates come from, by which I mean those debates where your position seems so obvious that you think the other guy must be being deliberately obtuse not to see what you see so clearly.

Someday AIs in charge of personal data management will fritter away their spare CPU cycles arguing about whether postal codes should be intrinsic parts of addresses or derived data items by way of post office servers. During the day, though, they'll warily agree to disagree and find a practical workaround.

No comments: