Producer/consumer with a persistent queue

Producer

Consider a producer which has a persistent queue of items in its database. In a local transaction, the producer can push items onto the end of its queue. This may be part of some larger atomic transaction which also makes changes to other data in the producer's database.

The items in the queue have a sequence number according to the order in which they are pushed.

It is allowable for the producer to routinely purge very old items from the front of the queue, as long as no consumer needs to be sent those items.

There is a weak durability assumption: an element appended to the producer queue must be made durable (i.e. written to nonvolatile storage) before it is available to be sent using a transient message queue.

Consumer

A consumer records the number of items it has processed so far in its database. Incrementing this value is like popping items from the front of the producer's queue. In a local transaction, the consumer can mark items as processed by increasing this number. This may be part of some larger atomic transaction which also makes changes to other data in the consumer's database.

Exactly once in order (EOIO)

The use of sequence numbers makes it easy to guarantee exactly once in order (EOIO) processing of items, without needing distributed transactions.

Not using distributed transactions means that atomic transactions can be committed on the producer database independently of atomic transactions committed on the consumer database. This independence means they operate asynchronously. This of course is one of the whole points of using a queue. The producer can push elements on the queue independently of the consumer popping elements from the queue.

But the consumer doesn't actually pop from the producer queue (i.e. erase elements from the queue). Instead it does so indirectly by incrementing a value in its own database which represents the number of elements the consumer has processed so far in its own database. Therefore, pop operations only involve updates to the consumer database.

Using a transient message queue to actually send the items

An appropriate building block for messaging is to use transient message queues which are brokerless, ordered, exactly once and point to point.

A transient message queue involves a transient send buffer on the producer and a transient receive buffer on the consumer

When the connection is first established, the consumer first sends the value of n (the number of items it has consumed so far, recorded in its database). That allows the producer to send the items (and only the items) in its queue which need to be processed by the consumer.

The sender reads its local database to read items, packages them into messages and writes the messages to a send buffer.

The receiver reads the messages, unpackages the items and updates its local database with local atomic non-durable transactions.

There is no need to synchronise with the producer because producers don't persist information about what items have been applied on consumers - indeed a consumer could crash and roll back to an earlier state, and when it next connects it may receive items it had previously received before it crashed.

Push versus pull messaging

In a push model, messages are sent by the producer as they are generated. By contrast in a pull model the messages are requested before they are sent. There seems to be a trade-off:

In the approach described above we get the best of both worlds. The consumer first sends a single subscription message, in that sense it's a pull model. The subscription message contains a sequence number to indicate what messages have already been consumed. After the producer receives the subscription message, it then pushes all the messages it produces exactly once and in order, starting from the requested sequence number.