Wednesday, October 25, 2006

Follow up on kanji coding method: Wuji

Turns out there is a character input method like the one I described in my previous post.
Sounds like it turns out to be kind of awkward to use. Ah well...

Sunday, October 15, 2006

Ban the dangling else!

I was just reading through Niklaus Wirth's paper, "Good Ideas, Through the Looking Glass" (found through Lambda the Ultimate, of course!) where he talks about the Dangling Else problem.

Some programming languages have statements of the form:

if X then Y
if X then Y else Z
with no end keyword or bracket, leading to the problem of how to interpret:
if it's Saturday then if it's sunny then have a picnic else see a movie
Do we see a movie if it's not Saturday, or if it is Saturday, but it's cloudy?

Natural languages have exactly the same sort of problem (example from the AP Press Guide to News Writing, quoted in Language Log):
We spent most of our time sitting on the back porch watching the cows playing Scrabble and reading.
It reads funny in English, but we resolve it easily because we know the context. Obviously context is not as helpful for a compiler.

How does Language Log suggest fixing that problem in English? They give two suggestions:
We spent most of our time sitting on the back porch watching the cows, playing Scrabble and reading.
Playing Scrabble and reading, we spent most of our time sitting on the back porch watching the cows.
The first suggestion corresponds approximately to Wirth's preference for an end keyword; the comma signals a break of some kind, and the obvious interpretation is that the cows and the scrabble shouldn't be too closely associated. It's not nearly so rigorous as end, of course.

The other suggestion is a bit more interesting; they rework the entire ordering of the sentence. Although the two examples aren't really parallel, it does suggest another way to rework our if-then-if-then-else, if we intend the final else to correspond to the first if-then:

if it's not Saturday then see a movie else if it's sunny then have a picnic.

In other words, we're punting -- not solving the original ambiguity but wording around to avoid the issue.

I think that's actually a better strategy from a human reader's standpoint. We do not have much in the way of "closing braces" in natural language: they tend to lead to center embedding issues which our brains just aren't equipped to deal with. So maybe a good policy would be for a compiler to simply disallow the "if-then-if-then-else" construction with either interpretation, and force the user to rework their logic a little. Such a restriction looks like an ugly hack to a computer scientist, but I think its necessary if we take seriously the idea that programs should be human languages as well as computer languages.

Tags: , , ,

Sunday, October 08, 2006

Metalanguage for Software Development

Every software developer uses a variety of languages to develop a program. It may seem like you're developing something purely in Perl or Java or VB or whatever, in fact you're using a lot of mini-languages to manage the development process:

  • Shell: if you're using a command line, you have to know shell commands for managing files and directories, invoking the compiler, etc.
  • IDE: On the other hand if you're using an IDE, you could kind of consider it language-like; you invoke series of drop down menus and click options on and off
  • source control: whether it's IDE-based or command-line based, you're describing and querying a specialized model encompassing time, files, directories, versions, and maybe different user identities
  • deployment: you use FTP commands, or something equivalent, to put files on a server, configure the server to run your programs at the appropriate time, etc.
  • building and linking: Like makefiles or visual studio "projects"
  • profiling: turning a profiler on or off, configuring it, and interpreting its output, is a language-like interaction
  • debugging: another model of the program, where you communicate with the debugger about variable values and code structures
  • SQL: typically any interaction with a database within your program, is done from within a walled-off sublanguage; maybe SQL built into strings, or maybe a specialized, but usually somewhat awkward, object or function call model.
  • Database configuration: setting up tables and so forth is often done with a combination of SQL or database management configuration IDE manipulation
When programmers think about the development process, we have an integrated mental model of all these aspects of the process, and from that figure out how to use them all together to do what needs to be done. It would be interesting to have a single, consistent meta-language for software development that encompassed all these tasks.

The closest thing we have to that, I think, is the command-line shell. The simpler "languages", such as the compiler settings language, are encapuslated as command-line arguments, so technically from the same prompt you are doing diverse tasks like compiling your program or renaming files. But useful as it is, it's kind of a gimmick. You can't easily pull together information from, say, the profiler, the debugger, and some unit test results, and ask questions that cut across these different domains.

For example, suppose you made a change to a procedure last week, and now you think it may be running too slow on a particular dataset. A test of that is easy to express in English: run the current version of the procedure X against dataset Y and note how long it takes; also run X against Y using the version of X that was current last Thursday. Implementing it would take a little work; we'd have to check out two versions, compile them in separate directories, run them both under a profiler, and know how to interpret the profiler results. Speaking for myself, I'd probably make a mistake the first time through -- I'd check out the wrong version of the code, or run the compiler with different optimization flags or something.

There oughtta be a language that has standard terminology for all these sorts of tools, and some easy way to build little modules onto the front end of a tool, that translates this language into the tool's configuration settings, and translates its results or error messages back into the language.

The trick is you'd have to have a pretty smart front end that could pull apart commands or queries that involved multiple tools and figure out what commands to pass along to the individual tools; then integrate the results it gets back. This would not be a trivial problem, but it would be a good start just to make this kind of task *expressible*, and require a lot of user guidance at first.