Saturday, September 26, 2009

The really interesting part of Google Wave - Operational Transform



Many people wowed at how Google Wave could display changes by multiple clients in real time. Displaying changes alone, however, is rather trivial - you just need to learn DOM ranges, find some way to make a DOM range adapter on IE (IE sucks, in 2009 it doesn't support DOM ranges you need to write an adapter to emulate that out of TextRange), and off you go.

The real juicy part of Google Wave is how it manages to synchronize multiple simultaneous edits correctly, without using any sort of locking mechanisms. When multiple users are editing the same document on Google Wave at the same time, and users updates and commits come at different times due to network latency - users don't get the SVN-like "you've got a conflict here" message or a Trac-like "the document has been modified since the last update" message. Google Wave doesn't tell its users off to resolve the edit conflicts themselves, while still maintaining correct synchronization between edits. That's the impressive part of Google Wave.

Note that, however, by saying "correct synchronization", I'm only meaning that the users' documents on screen would converge to the same thing in a steady state. It doesn't mean it would always converge to a document that makes sense. What OT means to a lay person is that instead of requiring the human to resolve edit conflicts, the computer would try its best to do it.

So, the really impressive part of Google Wave is actually its conflict resolution mechanism. If we draw out a "time line" of conflict resolution mechanisms used in collaborative software, we can have something like this:
  1. Simple locking - if someone is editing a document, you can't edit it. One example is vim on a multi-user UNIX computer.

    This is not even feasible on the web when you take network latency into account.

  2. Optimistic locking - you can always edit a document. But if there's any conflict, the system will tell you to resolve it yourself.

    The is the most common way of handling edit conflicts on the Internet. MediaWiki, Trac, Bugzilla, SVN, etc. use it. It's usually a good enough solution and is really simple to implement - the simplest way is attaching a revision number.

  3. Operational transformation - you can always edit a document. If there's any conflict, the system resolves it for you and updates you automatically on what it has done.

    This is what Google Wave and Etherpad are doing. Google Wave's version should be much more difficult (or, tedious) because its transform algorithm is running on a tree (i.e. XML) with semantics attached to nodes (because the XML is to be rendered to a human readable document). I'm not sure if Google's engineers have come up with some ingenious idea to make the "semantics attached to nodes" part immaterial to their transform algorithms.. but my personal guess is that the tree + semantics part is making their transform algorithms quite a few times more tedious than simply transforming linear text.
So what has Google done in their magical operational transformation whizzbang? Google has a white paper about it, but as Google always do - they only give you a glancing overview of it. The details are in their code.

http://www.waveprotocol.org/whitepapers/operational-transform