Messaging done right

Characteristics of CEDA messaging

CEDA supports synchronisation of replicated databases by sending operations in one-way messages. The distributed processing of operations is characterised as follows:

Advanced messaging systems are bad

Advanced messaging systems such as RabbitMQ feature complex brokers which guarantee at least once delivery by persisting the messages. This approach embodies anti-patterns that create complexity and reduce robustness and performance.

Messaging done right

As a matter of terminology we make a distinction between:

  1. items - which persist in a database, and are part of an application defined data model expressed with a logical schema; and
  2. messages - which are transient and sent over networks.

Messages contain serialised items. There isn't a one-to-one correspondence between items and messages. In a distributed system with many participating databases an item may be sent in many different messages. Also, it might make sense for messages to record multiple items, to amortise the space and processing overheads of messages, such as the message headers.

This means we make the following distinctions:

  1. Production : involves atomic transactions which generate persistent items in a producer database
  2. Sending : involves writing transient messages to a transient send buffer
  3. Receiving : involves reading transient messages from a transient receive buffer
  4. Consumption : involves atomic transactions which process items and make associated updates to a consumer database

An ideal messaging building block for a distributed system is trivially built on top of TCP, and involves transient, reliable, ordered, exactly once point to point messages. This handles the sending and receiving of messages using transient send/receive buffers, and tends to achieve optimal performance, simplicity, testability and avoids the pitfalls.

Ordered delivery might be thought to be incompatible with parallelism, but it's certainly not. For example it's easy for a message handler to post its messages to a thread pool, or to forward messages onto other nodes in the network. It makes perfect sense to build a robust, scalable distributed system on top of transient point to point reliable ordered message streams.

Sequence numbers can typically be used to support crash recovery, roll back, late comers, failures etc without distributed transactions.

A persistent queue might seem analogous to a persistent messaging system, but it's different in some crucial respects:

  1. The persistent queue is properly managed by a fully fledged DBMS, avoiding the inadequate persistence system anti-pattern
  2. The persistent queue is in the producer database, not in some other database. Therefore, atomic transactions on the producer database encompass the pushing of items on the queue.
  3. The number of items processed by a consumer is represented in the consumer database. Therefore, atomic transactions on the consumer database encompass what is effectively the popping of items from the queue.
  4. The approach has a very nice generalisation to multiple producers and consumers. This represents simple, efficient, reliable, exactly once "publish-subscribe" with multiple publishers and multiple subscribers.
  5. It's typically the case that the persistent queue is not physically materialised (i.e. in the form of a linear sequence of items created by a producer). This is one of the reasons persistent messages is an anti-pattern. Typically more efficient representations of the information created by producers are possible.
  6. The messages over the wire are expected to use transient message queues which are brokerless, ordered, exactly once and point to point

Contrast this to a talk by Udi Dahan where he proposes a (poor) solution which he says has "lots of moving parts" to reliable messaging without distributed transactions.

These differences for example make it possible to implement strongly consistent bank account transfers using double entry bookkeeping [] without distributed transactions.

Using the same approach to managing goods allows for avoiding distributed transactions when managing the buying and selling of goods, and commericial activities more generally, such as involving

A sequence of contingent business activities is sometimes referred to as a long running transaction or saga.

Double spending

Perhaps it's possible to avoid solving the consensus problem and yet avoid the issues around double spending. [see work in progress].