In the last post we covered the CAP theorem as the framework underlying most modern DDBSs. But we’re not out of the woods yet.

Around 2010, Daniel Abadi of Yale noted some shortcomings of how the CAP theorem had been understood and applied:

  1. The theorem seems to imply that you can choose any of these 3 configurations: CA, CP or AP. But A and C have an asymmetry here: if you choose to sacrifice A, you only have to do so when there’s a network partition. But systems that choose to sacrifice C (i.e. AP systems) must do so all the time.
  2. For all practical purposes, CA and CP are identical, since the upshot of a partition on a CA (non-partition-tolerant) system is that Availability is lost. Which is the same as CP.
  3. So for all intents and purposes, CAP focuses people on a single tradeoff: in the event of a partition, what does the system give up, C or A?
  4. But even this is a false tradeoff, since it implies that we give up consistency to gain availability. This is not always the case.

Abadi notes that in the absence of a partition, a CAP-based system is free to make all the ACID guarantees along with high availability. The reason to give up consistency then was not to gain availability.

Abadi notes that the reason to give up consistency and/or availability is the missing incredient: Latency.

Keeping replicas consistent over a wide area network requires at least one message to be sent over the WAN in the critical path to perform the write (some think that 2PC is necessary, but my student Alex Thomson has some research showing that this is not the case — more on this in a future post). Unfortunately, a message over a WAN significantly increases the latency of a transaction (on the order of hundreds of milliseconds), a cost too large for many Web applications that businesses like Amazon and Yahoo need to implement. Consequently, in order to reduce latency, replication must be performed asynchronously. This reduces consistency (by definition).

A high availability requirement implies that the system must replicate data. As soon as a distributed system replicates data, a tradeoff between consistency and latency arises.

The PACELC Theorem

Abadi proposed to revise CAP to include latency in this way:

In a system that replicates data:

  • if there is a partition (P), how does the system trade off availability and consistency (A and C);
  • else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C)?

He then rates various databases in terms of their tradeoffs:

  • Dynamo, Cassandra, Riak are PA/EL: if a partition occurs they choose availability over consistency; otherwise they sacrifice availability for lower latency
  • Fully ACID systems like BigTable, HBase etc. are PC/EC: they will choose consistency always, giving up availability and high latency
  • MongoDB is PA/EC: it chooses availability when partitions occur, but otherwise guarantees consistency
  • PNUTS is PC/EL

Further reading:

Next up:

  • Deep dive into transactions and Locks
  • Comparing databases
  • Important papers
    • Big Table
    • Dynamo DB: Gossip protocol (discovery and error detection), distributed key-value store, eventual consistency