Saturday, August 19, 2006

Saponomancy

This week I was asked to write some code to mimic a SOAP service. How hard could that be? It was a very simple service, just a couple remote procedure calls, one of which could return a large binary file.

Of course, the original one was implemented in Delphi, to run under IIS, and the new one needed to run on a UNIX-like system, but SOAP is intended on making all this easy, right? How much did I need to know?

Well, a lot, it turns out. I played with a couple different toolkits for creating SOAP services, and they kept coming up with slight variations, none of which seemed to interoperate out of the box. Little things would be different in the XML message, like whether namespaces were mentioned in the tags, or how the attachment was referenced and sent. It was easy to see and understand the problem by looking at the dumps of the XML and MIME stuff going over the wire, but much harder to plow through the documentation of the various API's to see what flags needed to be set in their object model, or what objects needed to be created, or who was responsible for allocating and freeing memory, etc.

In the end, I realized that since this was an internal service, and I controlled both endpoints, I could easily make a case to management for just using the raw HTTP protocol, and in fact as I played with that solution, it turned out to be cleaner, lighter, more maintainable, and easier to document. I'm changing my party affiliation to REST, even though it's too late to vote in the primaries.

Well, that's fine; I'm sure there must be cases where SOAP is a better solution, but it got me to thinking about why it is that the raw XML is so much easier to debug than the supposedly labor-saving frameworks built on top of it. I guess it's just that what goes across a line is easier to capture and pin down -- I can run two services, capture their input and output, and just compare the logs of them. But I can't compare the operations of their object models, because they're different.

What would be very cool, is if there were a formal way to specify a relationship between all the possible operations of an object model, and the grammar of its input and output streams. Then I could point to a rule in the grammar that I wanted to come out a certain way, and ask "what sequences of user-accessible object operations can cause this". The closest thing I've seen to this is the Whyline from project Marmelade at CMU. The Whyline is a thingy that looks like it's geared towards helping a programmer find bugs in their code, by tracing out why a particular condition was arrived at in code execution. Seems like it could be just as useful for prying apart the mysteries of someone else's prepackaged API, especially if it was closed-source, if somehow the Whyline could still operate without showing you precious vendor secrets.

Unfortunately the Whyline is still a research project built into a research language called Alice, not a handy button on my Visual Studio menubar.

No comments: