Deconstructing databases with Jim Gray
A genuine guru
Interview Most companies have a tame "guru" - someone presented as a world authority on the subject in question and so amazingly intelligent that they are above the tacky world of commercialism.
Sadly, many such "gurus" merely debase the term and turn out to be exactly what you expect - mouthpieces for the marketing department.
One of the great exceptions to this rule is Jim Gray, who has managed to combine an outstanding academic career (you don't win an ACM Turing Award for your attendance record) with a very practical one.
During the 1970s he was responsible for some of the most fundamental work on database theory and practice. To put this into context, it's because of Jim's work that we understand enough about transactions to be able to run multi-user databases effectively.
Over the years Jim has worked for DEC, Tandem and IBM. He is currently the head of Microsoft's Bay Area Research Centre and a Microsoft distinguished engineer. This is a man for all seasons - completely unpartisan. No matter who currently provides his pay cheque, he tells it as he sees it. I've heard him say, talking of the company for which he was then working: "We screwed up big time; it's as simple as that." This ought to make him a PR nightmare, but his standing in the community means that it has the opposite effect. People trust what he says.
If all that wasn't enough, Jim Gray is further renowned for being a humanist. One of the most decent, honest, upright, pleasant people you could hope to meet. After speaking at conferences, he is mobbed by attendees, but always finds time to talk to them and leaves each one feeling as if they have just been chatting to a friend.
Even hardened cynical computer journalists treat this guy with serious respect. Several of us were privileged to talk to him recently. In true guru style, he fielded every question without hesitation, deviation or repetition.
Transactions across services
On the topic of running transactions across services, what are the implications for transaction integrity?
As you may know, I spent decades working on the problem of getting transaction integrity to work at all; and on ACID [Atomicity, Consistency, Isolation and Durability - the four essential properties that a transaction must have - Ed] properties and how they can be implemented efficiently. Throughout this there has been always been the question of when should one use transactions and when are they inappropriate.
One of things that we worked on for decades is a transactional workflow metaphor. The basic idea is that if x is a good idea in computer science, then recursive x is better. So if we have transactions that are flat and are units of work, we should be able to nest them and parenthesise them and have sub transactions and sub sub sub transactions.
Frankly, over 25 years there's been a lot of work in this area and not much success. Workflow systems have, by and large, come to the conclusion that what the accountants told us when we started was correct - the best thing you can do is have compensation if you run a transaction and something goes wrong. You really can't undo the transaction, the only thing you can do is run a new transaction that reverses the effects of that previous transaction as best you can.
Often it is not possible to reverse the effects of an action: if you drill a hole in a piece of metal, you can't un-drill the hole. If you fire a missile, the abort is to destroy the missile. It's Carnot's [Sadi Carnot, French engineer (1796-1832) - Ed] second law of thermodynamics that says you can't run time backwards all the time, though you can in some situations. If it's just a North and South pole on a disk; if you made the pole North you can make it South. So long as you can keep everything inside the system things are reversible, but the minute things leak out of the system - for example, drilling the hole in the metal as an example of something leaking out - then the compensation action for that has to be something outside the computer's world.
So now, we come to a world which has a loosely coupled service-oriented architecture and people are asking the question: "What is the role of transactions in service-oriented architecture and internet scale distributed computing?"
So we have the notion of a transaction (and also compensation) and we also have the notion of a workflow. The Microsoft technology for this is BizTalk. There are many other people who have products that are workflow scripting languages; that allow you to declare transformations and also compensations associated with the transformation. If something goes wrong with the transaction, it goes back and applies the compensation action as best it can. However one of the results of compensation may be "I'm sorry I can't compensate for that transaction, some human is going to have to intervene to deal with this case", and I think that is the best story I can give.
Transactions are wonderful inside an organisation, they simplify things, they are all about making error handling easier, but when you get to internet scale and go across architectural boundaries error handling becomes much more difficult. When physical resources are involved, error handling becomes much more difficult because certain operations are not reversible. In those situations, you do the best you can in software and at a certain point, you have to involve people.