Thursday, September 07, 2006

Organizing programs by their invariants

It seems to me that natural langauge program specifications tend to consist of a lot of statements which are all independently true, and only dependent on context for deictic references (i.e. pronouns and pronoun-like references to concepts in surrounding sentences). In other words, I'd claim, you could make sense of a good proportion of a scrambled specification if:

  • Sentences remained intact
  • Deictic references were replaced by full references to global names of entities or processes
Now obviously, not everything in such a document will behave that way; for example an ordered list of steps will have their ordering and context lost. But I think my claim is more true of an English-language spec, then, say, a large FORTRAN program. If you have a spec you're working with closely, you might have sticky notes on different pages, and individual sentences highlighted, which you can turn to and refer to instantly without necessarily reading for context. An old-skool not-very-modular FORTRAN program, on the other hand, is meant to be executed in one fell swoop. A programmer might turn to a particular page of a printout and look for some information, but will require a lot more scanning around and reasoning to come up with an answer to their question about what's going on in the program.

A Java program might fall somewhere in between English and Fortran in that regard -- Java tends to be written with lots of smallish functions, each of which might be comprehended on its own.

Another reason a natural-language spec is easier to read in a random order, is that the words in an English sentence are more likely to be standard vocabulary not defined in the spec; and the ones that are defined in the spec are likely to be motivated* narrowings or metaphors of standard terminology. In all the programming languages I'm familiar with, almost all the entities defined in a particular system can be freely named: a there may be conventions telling a programmer what sort of thing a WidgetWindowHandlerFactory is, but the compiler doesn't care; it could be a synonym for Integer.

So this habit of natural language helps make randomly-chosen sentences more comprehensible on their own. That in turn makes it easier for us short-attention-span programmers to digest the piles of paper our managers and user groups churn out.

This suggests a couple interesting features for programming languages:
  1. Structurally organize programs around a series of declared invariants, with the compiler (when possible) or the programmer (the rest of the time) filling in associated code to ensure the invariant remains true. This could be done in a lot of different ways depending on the need -- type systems, aspects, agents.
  2. Create a large and varied, but standard and fixed, ontology of types that the user should almost always derive from. Give them all short names and require user-defined types and variables to end with those type names. This would have to be done carefully to avoid javaStyleWordiness(), and it would also be a larger learning curve/burden on the programmer. I'm picturing something vaguer and more flexible than a standard library; rather than providing a bunch of standard implementations, the ontology would provide standard invariants.
* "motivated" -- I got this term from George Lakoff's book "Women, Fire, and Dangerous Things" where he talks about how new uses for old words are motivated by metaphorical relations with older meanings, but not predictable from those meanings.

Tags: , ,

No comments: