Double-spending [] is a potential problem in a distributed system of bank accounts, where an account owner spends money they don't have, by sending money to multiple receivers at the same time, such that the sender's account is overdrawn.
An example might be an individual that creates a wallet, and makes multiple purchases at the same time.
The cryptocurrency bitcoin tries to avoid double spending attacks by recording all transactions in a public ledger called a blockchain, and using proof-of-work to try to impose consensus. It takes vast computing power to extend the blockchain, therefore it takes vast computing power to generate an alternative.
Nevertheless several cryptocurrencies have suffered double spending attacks, such as Bitcoin Gold on May 18 2018.
In this section we describe the normal operation of the system, i.e. when computers don't fail, the network is healthy and all account owners follow the rules.
Persistent queues allow for strongly consistent bank account transfers without distributed transactions.
It is assumed there is a network of computers (nodes). Each node has a database supporting local ACID transactions. There is no need for distributed transactions. In fact there is no need for any kind of global consensus protocol. All communication is fully asynchronous and involves transient, point to point messages on top of TCP.
Each node independently keeps track of all accounts in the system and all transfers between them. Each node uses double-entry bookkeeping, and never allows an account to have a negative balance. There is no concept of nodes being "synchronised".
Each account has a public/private key. The public key provides a unique identifier for the account. The private key is only known by the account owner. This is needed to authorise the sending of money from that account to other accounts.
An account owner can produce deposit records to send money to other accounts. A deposit record is a 4-tuple:
( from-account-public-key, from-account-sequence-number, set of (to-account-public-key, amount) pairs, signature )
The deposit records generated on an account (i.e. the from-account) have a sequence number starting at 0, corresponding to the order in which they were created by the account owner.
All nodes are expected to apply the deposit records for a given from-account exactly once in the order corresponding to the sequence number - i.e. the order in which they were generated by the account owner.
The account owner must make sure their account is not overdrawn at the time they generate a deposit record. Otherwise nodes will reject the transfer, and therefore all subsequent transfers (since all nodes apply them in order). This can make the account forfeit - i.e. behave as though the money in the account has been permanently lost. A forfeit account represents deflation and benefits the community at the expense of the account owner.
The account owner includes a digital signature in each deposit-record which is calculated using their private key. This allows all nodes to independently authenticate each deposit record - i.e. confirm that it was authorised by the account owner.
The mapping from account-id to the number of deposit records on that account can be regarded as a vector time. When two nodes first connect using TCP, they exchange vector times to work out what deposit records they have that the other node doesn't - i.e. the records that need to be sent. When there are millions of accounts this can be made efficient by only sending deltas relative to "check point" vector times.
In a session between two nodes, the deposit records on a given from-account must be sent in an order respecting the sequence numbers. This is easy to check, a node will simply disconnect from a malicious node that doesn't respect the protocol.
There is no requirement that records be sent in an order that preserves causality overall. Dropping this assumption eliminates the potential for attacks by malicious nodes that purposefully send records out of order. However it means that account balances could potentially go negative because account sends are being processed before the account receives that causally precede them.
Therefore nodes apply the deposit-records in an order that makes negative balance impossible. Each node manages a current set of pending deposit-records. These are records which have been validated but cannot be applied yet. They are waiting for the from-account to have sufficient funds to cover the transfer.
A peer disconnects when the session protocol isn't adhered to:
It is possible to monitor the health of a network, such as by using heartbeats. This will detect significant network partitions.
Consider a clock which is only allowed to advance during periods where the network is healthy. This allows a node to measure the amount of time for which the network was healthy between the first and second spends of a double spend.
If this is large then the node can reliably infer most of the nodes in the network will have observed the first spend occuring before the second spend.
Note that a tree is a DAG whose undirected version is still acyclic, or putting it another way, trees are DAGs with the restriction that a child can only have one parent.
Consider all the deposit records on a given from-account. The signature recorded in a record isn't just over that record, it is also calculated over the signature of the previous record in the sequence (unless it's the 0th record). Therefore all records apart from the 0th have a parent. This allows attempted double spends to be seen as producing a tree of records, instead of a more general DAG.
Consider an attempt at double spending. This cannot involve operations with different sequence numbers because every node will apply them in the same order, so all agree that if the account is overdrawn, it will be the spends with the higher sequence numbers which fail.
A double spend requires injection of alternative deposit records with the same sequence number. This is easy for all nodes to detect. The question is what to do about it. A major challenge is that a malicious account owner can readily generate double spend records with low sequence numbers - i.e. which conflict with previous spends going back days, weeks or even years. We can't simply punish the double spend attacker by forfeiting their account including all conflicting spends. The nodes need to commit to spends that were successful in the past. The further back in time, the more important it is to commit to the spends.
Having the nodes commit to one of the conflicting spends but not the other seems to be a consensus problem. But all solutions to the consensus problem have significant downsides, in terms of complexity, robustness and performance. This seem inevitable given the FLP impossibility theorem.
An extreme example is bitcoin which uses enough power to run a small country in order to solve the consensus problem.
The important thing is to commit to older spends when there are double spends. Unfortunately distributed systems don't have any concept of perfectly synchronised clocks, indeed special relativity forbids it.
Rather than impose consensus we allow divergence!
Each node decides for itself what happened based on when it received the conflicting spends, using its local clock and network health monitoring. Other nodes may have made different decisions. We're not talking about eventual consistency, but rather real inconsistencies that although rare are never resolved when dealing with double spend attacks.
Even distributed systems with thousands of nodes spanning the planet allow deposit records to propagate to all nodes in less than a second. Therefore during periods when the network is healthy, we have a great basis for achieving very good consensus on the order of conflicting double spends, depending on how long apart they occur. This is fortuitous - just when it's important to agree on the order, the nodes agree on the order.
What's the problem with allowing divergence? It's only a problem if it's a problem to the users of the system.
A tiny and insignificant proportion of the money in the system is marked as "flaky" - because it involves double spend attacks. There are potentially different opinions of who owns this "flaky" money, depending on which node you talk to. The existence of this flaky money is more of a curiocity to users of the system, it has no harmful affects.
To model this idea, every node records two balances on every account - the solid-balance and the flaky-balance. As far as that node is concerned the flaky money is real money. The account balance is the sum of the solid-balance and flaky-balance.
When a transfer between accounts is executed, if there is enough total money in the from-account, the money is first taken out of the from-account's solid-balance and given to the to-account's solid-balance. If that reaches 0 then the rest is taken out of the from-account's flaky-balance and given to the to-account's flaky-balance.
It is assumed account owners using the system agree that if they attempt a double spend they are at risk of forfeiting their entire account, regardless of the balance remaining, and also the sellers are not required to send goods/services whether they are paid or not.
Each node persists information about double spends. The idea is to allow the receivers of the money to be notified of double spend attacks as soon as they are detected. A receiver can monitor the network health and wait for some time before shipping the good, if they want to protect themselves from double spend attacks.
Consider that a node receives a double spend. We can speak of a first spend and a second spend which arrived t seconds after the first. Each node uses the following strategy:
If an account owner monitors a node and it reports that it received flaky money then the account owner cannot anything about other nodes. It's possible most nodes rejected the transfer, or most also considered it flaky, or most considered it solid. Money in a flaky-balance means "unknown".
By contrast a node that says the money is solid provides some evidence that the money is reliable