Sunday, October 08, 2006

Metalanguage for Software Development

Every software developer uses a variety of languages to develop a program. It may seem like you're developing something purely in Perl or Java or VB or whatever, in fact you're using a lot of mini-languages to manage the development process:

  • Shell: if you're using a command line, you have to know shell commands for managing files and directories, invoking the compiler, etc.
  • IDE: On the other hand if you're using an IDE, you could kind of consider it language-like; you invoke series of drop down menus and click options on and off
  • source control: whether it's IDE-based or command-line based, you're describing and querying a specialized model encompassing time, files, directories, versions, and maybe different user identities
  • deployment: you use FTP commands, or something equivalent, to put files on a server, configure the server to run your programs at the appropriate time, etc.
  • building and linking: Like makefiles or visual studio "projects"
  • profiling: turning a profiler on or off, configuring it, and interpreting its output, is a language-like interaction
  • debugging: another model of the program, where you communicate with the debugger about variable values and code structures
  • SQL: typically any interaction with a database within your program, is done from within a walled-off sublanguage; maybe SQL built into strings, or maybe a specialized, but usually somewhat awkward, object or function call model.
  • Database configuration: setting up tables and so forth is often done with a combination of SQL or database management configuration IDE manipulation
When programmers think about the development process, we have an integrated mental model of all these aspects of the process, and from that figure out how to use them all together to do what needs to be done. It would be interesting to have a single, consistent meta-language for software development that encompassed all these tasks.

The closest thing we have to that, I think, is the command-line shell. The simpler "languages", such as the compiler settings language, are encapuslated as command-line arguments, so technically from the same prompt you are doing diverse tasks like compiling your program or renaming files. But useful as it is, it's kind of a gimmick. You can't easily pull together information from, say, the profiler, the debugger, and some unit test results, and ask questions that cut across these different domains.

For example, suppose you made a change to a procedure last week, and now you think it may be running too slow on a particular dataset. A test of that is easy to express in English: run the current version of the procedure X against dataset Y and note how long it takes; also run X against Y using the version of X that was current last Thursday. Implementing it would take a little work; we'd have to check out two versions, compile them in separate directories, run them both under a profiler, and know how to interpret the profiler results. Speaking for myself, I'd probably make a mistake the first time through -- I'd check out the wrong version of the code, or run the compiler with different optimization flags or something.

There oughtta be a language that has standard terminology for all these sorts of tools, and some easy way to build little modules onto the front end of a tool, that translates this language into the tool's configuration settings, and translates its results or error messages back into the language.

The trick is you'd have to have a pretty smart front end that could pull apart commands or queries that involved multiple tools and figure out what commands to pass along to the individual tools; then integrate the results it gets back. This would not be a trivial problem, but it would be a good start just to make this kind of task *expressible*, and require a lot of user guidance at first.


No comments: